date:20130522

shard splitting

2013-05-22 Thread Arkadi Colson


Hi


I tried to split a shard but it failed. If I try to do it again it does 
not start again.
I see the to extra shards in /collections/messages/leader_elect/ and 
/collections/messages/leaders/

How can I fix this?


root@solr07-dcg:/solr/messages_shard3_replica2# curl 
'http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=messagesshard=shard3'

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status500/intint 
name=QTime300117/int/lstlst name=errorstr 
name=msgsplitshard the collection time out:300s/strstr 
name=traceorg.apache.solr.common.SolrException: splitshard the 
collection time out:300s
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)
/strint name=code500/int/lst
/response

INFO  - 2013-05-22 06:45:54.148; 
org.apache.solr.handler.admin.CoreAdminHandler; Invoked split action for 
core: messages_shard3_replica1
INFO  - 2013-05-22 06:45:54.271; 
org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: 
partitions=2 segments=29
INFO  - 2013-05-22 06:46:03.240; 
org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partition 
#0 range=2aaa-5554



BR
Arkadi

solr starting time takes too long

2013-05-22 Thread Zhang, Lisheng


Hi,

We are using solr 3.6.1, our application has many cores (more than 1K),
the problem is that solr starting took a long time (10m). Examing log 
file and code we found that for each core we loaded many resources, but 
in our app, we are sure we are always using the same solrconfig.xml and
schema.xml for all cores. While we can config schema.xml to be shared, 
we cannot share SolrConfig object. But looking inside SolrConfig code, 
we donot use any of the cache. 

Could we somehow change config (or source code) to share resource between
cores to reduce solr starting time?

Thanks very much for helps, Lisheng

Re: solr starting time takes too long

2013-05-22 Thread Carlos Bonilla

Hi Lisheng,
I had the same problem when I enabled the autoSoftCommit in
solrconfig.xml. If you have it enabled, disabling it could fix your problem,

Cheers.
Carlos.


2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com


 Hi,

 We are using solr 3.6.1, our application has many cores (more than 1K),
 the problem is that solr starting took a long time (10m). Examing log
 file and code we found that for each core we loaded many resources, but
 in our app, we are sure we are always using the same solrconfig.xml and
 schema.xml for all cores. While we can config schema.xml to be shared,
 we cannot share SolrConfig object. But looking inside SolrConfig code,
 we donot use any of the cache.

 Could we somehow change config (or source code) to share resource between
 cores to reduce solr starting time?

 Thanks very much for helps, Lisheng

RE: solr starting time takes too long

2013-05-22 Thread Zhang, Lisheng

Thanks very much for quick helps! I searched but it seems that
autoSoftCommit is solr 4x feature and we are still using 3.6.1?

Best regards, Lisheng

-Original Message-
From: Carlos Bonilla [mailto:carlosbonill...@gmail.com]
Sent: Wednesday, May 22, 2013 12:17 AM
To: solr-user@lucene.apache.org
Subject: Re: solr starting time takes too long

Hi Lisheng,
I had the same problem when I enabled the autoSoftCommit in
solrconfig.xml. If you have it enabled, disabling it could fix your problem,

Cheers.
Carlos.

2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com

 Hi,

 We are using solr 3.6.1, our application has many cores (more than 1K),
 the problem is that solr starting took a long time (10m). Examing log
 file and code we found that for each core we loaded many resources, but
 in our app, we are sure we are always using the same solrconfig.xml and
 schema.xml for all cores. While we can config schema.xml to be shared,
 we cannot share SolrConfig object. But looking inside SolrConfig code,
 we donot use any of the cache.

 Could we somehow change config (or source code) to share resource between
 cores to reduce solr starting time?

 Thanks very much for helps, Lisheng

Re: Boosting Documents

2013-05-22 Thread Oussama Jilal


Thank you for your reply bbarani,

I can't do that because I want to boost some documents over others, 
independing of the query.


On 05/21/2013 05:41 PM, bbarani wrote:

Why don't you boost during query time?

Something like q=supermanqf=title^2 subject

You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQ



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: shard splitting

2013-05-22 Thread Arkadi Colson

clusterstate.json is now reporting shard3 as inactive. Any idea how to 
change clusterstate.json manually from commandline?


On 05/22/2013 08:59 AM, Arkadi Colson wrote:

Hi


I tried to split a shard but it failed. If I try to do it again it 
does not start again.
I see the to extra shards in /collections/messages/leader_elect/ and 
/collections/messages/leaders/

How can I fix this?


root@solr07-dcg:/solr/messages_shard3_replica2# curl 
'http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=messagesshard=shard3'

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status500/intint 
name=QTime300117/int/lstlst name=errorstr 
name=msgsplitshard the collection time out:300s/strstr 
name=traceorg.apache.solr.common.SolrException: splitshard the 
collection time out:300s
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300)
at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:722)
/strint name=code500/int/lst
/response

INFO  - 2013-05-22 06:45:54.148; 
org.apache.solr.handler.admin.CoreAdminHandler; Invoked split action 
for core: messages_shard3_replica1
INFO  - 2013-05-22 06:45:54.271; 
org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: 
partitions=2 segments=29
INFO  - 2013-05-22 06:46:03.240; 
org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partition 
#0 range=2aaa-5554



BR
Arkadi

RE: Upgrade Solr index from 4.0 to 4.2.1

2013-05-22 Thread Elran Dvir

My index is originally of version 4.0. My methods failed with this 
configuration. 
So, I changed  solrconfig.xml  in my index to both versions: LUCENE_42 and 
LUCENE_41.
For each version in each method (loading and IndexUpgrader), I see the same 
errors as before.

Thanks.

-Original Message-
From: Elran Dvir 
Sent: Tuesday, May 21, 2013 6:48 PM
To: solr-user@lucene.apache.org
Subject: RE: Upgrade Solr index from 4.0 to 4.2.1

Why LUCENE_42?Why not LUCENE_41?
Do I still need to run IndexUpgrader or just loading will be enough?

Thanks.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, May 21, 2013 2:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Upgrade Solr index from 4.0 to 4.2.1

This is always something that gives me a headache, but what happens if you 
change luceneMatchVersion in solrconfig.xml to LUCENE_40? I'm assuming it's 
LUCENE_42...

Best
Erick

On Tue, May 21, 2013 at 5:48 AM, Elran Dvir elr...@checkpoint.com wrote:
 Hi all,

 I have a 4.0 Solr (sharded/cored) index.
 I upgraded Solr to 4.2.1 and tried to load the existing index with it. I got 
 the following exception:

 May 21, 2013 12:03:42 PM org.apache.solr.common.SolrException log
 SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: 
 other_2013-05-04
 at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
 at 
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
 at 
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
 at 
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345)
 at java.util.concurrent.FutureTask.run(FutureTask.java:177)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345)
 at java.util.concurrent.FutureTask.run(FutureTask.java:177)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:779)
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at org.apache.solr.core.SolrCore.init(SolrCore.java:822)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
 at 
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
 ... 10 more
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at 
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435)
 at 
 org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:797)
 ... 13 more
 Caused by: org.apache.solr.common.SolrException: Error opening Reader
 at 
 org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:183)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:179)
 at 
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411)
 ... 15 more
 Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: 
 actual codec=Lucene40StoredFieldsIndex vs expected 
 codec=Lucene41StoredFieldsIndex (resource: 
 MMapIndexInput(path=/var/solr/multicore_solr/other_2013-05-04/data/index/_3gfk.fdx))
 at 
 org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:140)
 at 
 org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:130)
 at 
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:102)
 at 
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
 at 
 org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147)
 at 
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
 at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
 at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
 at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
 at 
 org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
 at

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

Hi Oussama,

This is explained very nicely on Solr Wiki..
http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

All you need to do is something similar to below..

   -

   add  doc boost=2.5field name=employeeId05991/field
  field name=office boost=2.0Bridgewater/field  /doc/add


What is not clear from your message is whether you need better scoring or
better sorting. so, additionally, you can consider adding a secondary sort
parameter for the docs having the same score.
http://wiki.apache.org/solr/CommonQueryParameters#sort


HTH,
Sandeep


On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote:

 Thank you for your reply bbarani,

 I can't do that because I want to boost some documents over others,
 independing of the query.


 On 05/21/2013 05:41 PM, bbarani wrote:

  Why don't you boost during query time?

 Something like q=supermanqf=title^2 subject

 You can refer: 
 http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: [Solr 4.2.1] LotsOfCores - Can't query cores with loadOnStartup=true and transient=true

2013-05-22 Thread Lyuba Romanchuk

Hi Erick,

I opened an issue in JIRA: SOLR-4850. But I don't see how to change an
assignee, I don't think that I have permissions to do it.


Thank you.
Best regards,
Lyuba


On Mon, May 20, 2013 at 6:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 Lyuba:

 Could you go ahead and raise a JIRA and assign it to me to
 investigate? You should definitely be able to define cores this way.

 Thanks,
 Erick

 On Sun, May 19, 2013 at 9:27 AM, Lyuba Romanchuk
 lyuba.romanc...@gmail.com wrote:
  Hi,
 
  It seems like in order to query transient cores they must be defined with
  loadOnStartup=false.
 
  I define one core loadOnStartup=true and transient=false, and another
  cores to be  loadOnStartup=true and transient=true, and
  transientCacheSize=Integer.MAX_VALUE.
 
  In this case CoreContainer.dynamicDescriptors will be empty and then
  CoreContainer.getCoreFromAnyList(String) and
 CoreContainer.getCore(String)
  returns null for all transient cores.
 
  I looked at the code of 4.3.0 and it doesn't seem that the flow was
  changed, the core is added only if it's not loaded on start up.
 
  Could you please assist with this issue?
 
  Best regards,
  Lyuba

Re: Boosting Documents

2013-05-22 Thread Oussama Jilal


Thank you Sandeep,

I did post the document like that (a minor difference is that I did not 
add the boost to the field since I don't want to boost on specific 
field, I boosted the whole document 'doc boost=2.0  /doc'), 
but the issue is that everything in the queries results has the same 
score even if they had been indexed with different boosts, and I can't 
sort on another field since this is independent from any field value.


Any ideas ?

On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

Hi Oussama,

This is explained very nicely on Solr Wiki..
http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

All you need to do is something similar to below..

-

add  doc boost=2.5field name=employeeId05991/field
   field name=office boost=2.0Bridgewater/field  /doc/add


What is not clear from your message is whether you need better scoring or
better sorting. so, additionally, you can consider adding a secondary sort
parameter for the docs having the same score.
http://wiki.apache.org/solr/CommonQueryParameters#sort


HTH,
Sandeep


On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote:


Thank you for your reply bbarani,

I can't do that because I want to boost some documents over others,
independing of the query.


On 05/21/2013 05:41 PM, bbarani wrote:


  Why don't you boost during query time?

Something like q=supermanqf=title^2 subject

You can refer: 
http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ



--
View this message in context: http://lucene.472066.n3.**
nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting Documents

2013-05-22 Thread Oussama Jilal

I don't know if this is the issue or not but, concidering this note from 
the wiki :


NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) 
for any fields where the index-time boost should be stored.


In my case where I only need to boost the whole document (not a specific 
field), do I have to activate the  omitNorms=false  for all the 
fields in the schema ?




On 05/22/2013 10:41 AM, Oussama Jilal wrote:

Thank you Sandeep,

I did post the document like that (a minor difference is that I did 
not add the boost to the field since I don't want to boost on specific 
field, I boosted the whole document 'doc boost=2.0  /doc'), 
but the issue is that everything in the queries results has the same 
score even if they had been indexed with different boosts, and I can't 
sort on another field since this is independent from any field value.


Any ideas ?

On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

Hi Oussama,

This is explained very nicely on Solr Wiki..
http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 



All you need to do is something similar to below..

-

add  doc boost=2.5field name=employeeId05991/field
   field name=office boost=2.0Bridgewater/field /doc/add


What is not clear from your message is whether you need better 
scoring or
better sorting. so, additionally, you can consider adding a secondary 
sort

parameter for the docs having the same score.
http://wiki.apache.org/solr/CommonQueryParameters#sort


HTH,
Sandeep


On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote:


Thank you for your reply bbarani,

I can't do that because I want to boost some documents over others,
independing of the query.


On 05/21/2013 05:41 PM, bbarani wrote:


  Why don't you boost during query time?

Something like q=supermanqf=title^2 subject

You can refer: 
http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ




--
View this message in context: http://lucene.472066.n3.**
nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html 


Sent from the Solr - User mailing list archive at Nabble.com.

Regular expression in solr

2013-05-22 Thread Sagar Chaturvedi

Hi,

How do we search based upon regular expressions in solr?

Regards,
Sagar



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

I think that is applicable only for the field level boosting and not at
document level boosting.

Can you post your query, field definition and results you're expecting.

I am using index and query time boosting without any issues so far. also
which version of Solr you're using?


On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

 I don't know if this is the issue or not but, concidering this note from
 the wiki :

 NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
 for any fields where the index-time boost should be stored.

 In my case where I only need to boost the whole document (not a specific
 field), do I have to activate the  omitNorms=false  for all the
 fields in the schema ?




 On 05/22/2013 10:41 AM, Oussama Jilal wrote:

 Thank you Sandeep,

 I did post the document like that (a minor difference is that I did not
 add the boost to the field since I don't want to boost on specific field, I
 boosted the whole document 'doc boost=2.0  /doc'), but the issue
 is that everything in the queries results has the same score even if they
 had been indexed with different boosts, and I can't sort on another field
 since this is independent from any field value.

 Any ideas ?

 On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

 Hi Oussama,

 This is explained very nicely on Solr Wiki..
 http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
 http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**
 attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

 All you need to do is something similar to below..

 -

 add  doc boost=2.5field name=employeeId05991/**field
field name=office boost=2.0Bridgewater/**field /doc/add


 What is not clear from your message is whether you need better scoring or
 better sorting. so, additionally, you can consider adding a secondary
 sort
 parameter for the docs having the same score.
 http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort


 HTH,
 Sandeep


 On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote:

  Thank you for your reply bbarani,

 I can't do that because I want to boost some documents over others,
 independing of the query.


 On 05/21/2013 05:41 PM, bbarani wrote:

Why don't you boost during query time?

 Something like q=supermanqf=title^2 subject

 You can refer: 
 http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ
 http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ
 



 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html
 http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-**
 tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html

 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Boosting Documents

2013-05-22 Thread Oussama Jilal

I don't know if this can help (since the document boost should be 
independent of any schema) but here is my schema :


   |?xml version=1.0 encoding=UTF-8?
   schema  name=  version=1.5
types
fieldType  name=string  class=solr.StrField  
sortMissingLast=true  /
fieldType  name=long  class=solr.TrieLongField  sortMissingLast=true  
precisionStep=0  positionIncrementGap=0  /
fieldType  name=text  class=solr.TextField  sortMissingLast=true  
omitNorms=true
analyzer  type=index
tokenizer  class=solr.KeywordTokenizerFactory  /
filter  class=solr.LowerCaseFilterFactory  /
filter  class=solr.EdgeNGramFilterFactory  
maxGramSize=255  /
/analyzer
analyzer  type=query
tokenizer  class=solr.KeywordTokenizerFactory  /
filter  class=solr.LowerCaseFilterFactory  /
/analyzer
/fieldType
/types
fields
field  name=Id  type=string  indexed=true  stored=true  
multiValued=false  required=true  /
field  name=Suggestion  type=text  indexed=true  stored=true  
multiValued=false  required=false  /
field  name=Type  type=string  indexed=true  stored=true  
multiValued=false  required=true  /
field  name=Sections  type=string  indexed=true  stored=true  
multiValued=true  required=false  /
field  name=_version_  type=long  indexed=true  
stored=true/
/fields
copyField  source=Id  dest=Suggestion  /
uniqueKeyId/uniqueKey
defaultSearchFieldSuggestion/defaultSearchField
   /schema|

My query is somthing like : Suggestion:Olive Oil.

The result is 9 documents, wich all has the same score 11.287682, even 
if they had been indexed with different boosts (I am sure of this).




On 05/22/2013 10:54 AM, Sandeep Mestry wrote:

I think that is applicable only for the field level boosting and not at
document level boosting.

Can you post your query, field definition and results you're expecting.

I am using index and query time boosting without any issues so far. also
which version of Solr you're using?


On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:


I don't know if this is the issue or not but, concidering this note from
the wiki :

NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
for any fields where the index-time boost should be stored.

In my case where I only need to boost the whole document (not a specific
field), do I have to activate the  omitNorms=false  for all the
fields in the schema ?




On 05/22/2013 10:41 AM, Oussama Jilal wrote:


Thank you Sandeep,

I did post the document like that (a minor difference is that I did not
add the boost to the field since I don't want to boost on specific field, I
boosted the whole document 'doc boost=2.0  /doc'), but the issue
is that everything in the queries results has the same score even if they
had been indexed with different boosts, and I can't sort on another field
since this is independent from any field value.

Any ideas ?

On 05/22/2013 10:30 AM, Sandeep Mestry wrote:


Hi Oussama,

This is explained very nicely on Solr Wiki..
http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**
attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

All you need to do is something similar to below..

 -

 add  doc boost=2.5field name=employeeId05991/**field
field name=office boost=2.0Bridgewater/**field /doc/add


What is not clear from your message is whether you need better scoring or
better sorting. so, additionally, you can consider adding a secondary
sort
parameter for the docs having the same score.
http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort


HTH,
Sandeep


On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote:

  Thank you for your reply bbarani,

I can't do that because I want to boost some documents over others,
independing of the query.


On 05/21/2013 05:41 PM, bbarani wrote:

Why don't you boost during query time?

Something like q=supermanqf=title^2 subject

You can refer: 
http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ
http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ


--
View this message in context: http://lucene.472066.n3.**
nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html
http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-**

Re: Regular expression in solr

2013-05-22 Thread Oussama Jilal

You can write a regular expression query like this (you need to specify 
the regex between slashes / ) :


fieldName:/[rR]egular.*/

On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:

Hi,

How do we search based upon regular expressions in solr?

Regards,
Sagar



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

Did you use the debugQuery=true in solr console to see how the query is
being interpreted and the result calculation?

Also, I'm not sure but this copyfield directive seems a bit confusing to
me..
copyField  source=Id  dest=Suggestion  /
Because multiValued is false for Suggestion field so does that schema mean
Suggestion has value only from Id and not from any other input?

You haven't mentioned the version of Solr, can you also post the query
params?



On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote:

 I don't know if this can help (since the document boost should be
 independent of any schema) but here is my schema :

|?xml version=1.0 encoding=UTF-8?
schema  name=  version=1.5
 types
 fieldType  name=string  class=solr.StrField
  sortMissingLast=true  /
 fieldType  name=long  class=solr.TrieLongField
  sortMissingLast=true  precisionStep=0  positionIncrementGap=0  /
 fieldType  name=text  class=solr.TextField
  sortMissingLast=true  omitNorms=true
 analyzer  type=index
 tokenizer  class=solr.**KeywordTokenizerFactory
  /
 filter  class=solr.**LowerCaseFilterFactory  /
 filter  class=solr.**EdgeNGramFilterFactory
  maxGramSize=255  /
 /analyzer
 analyzer  type=query
 tokenizer  class=solr.**KeywordTokenizerFactory
  /
 filter  class=solr.**LowerCaseFilterFactory  /
 /analyzer
 /fieldType
 /types
 fields
 field  name=Id  type=string  indexed=true
  stored=true  multiValued=false  required=true  /
 field  name=Suggestion  type=text  indexed=true
  stored=true  multiValued=false  required=false  /
 field  name=Type  type=string  indexed=true
  stored=true  multiValued=false  required=true  /
 field  name=Sections  type=string  indexed=true
  stored=true  multiValued=true  required=false  /
 field  name=_version_  type=long  indexed=true
  stored=true/
 /fields
 copyField  source=Id  dest=Suggestion  /
 uniqueKeyId/uniqueKey
 defaultSearchField**Suggestion/**defaultSearchField
/schema|

 My query is somthing like : Suggestion:Olive Oil.

 The result is 9 documents, wich all has the same score 11.287682, even
 if they had been indexed with different boosts (I am sure of this).




 On 05/22/2013 10:54 AM, Sandeep Mestry wrote:

 I think that is applicable only for the field level boosting and not at
 document level boosting.

 Can you post your query, field definition and results you're expecting.

 I am using index and query time boosting without any issues so far. also
 which version of Solr you're using?


 On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

  I don't know if this is the issue or not but, concidering this note from
 the wiki :

 NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
 for any fields where the index-time boost should be stored.

 In my case where I only need to boost the whole document (not a specific
 field), do I have to activate the  omitNorms=false  for all the
 fields in the schema ?




 On 05/22/2013 10:41 AM, Oussama Jilal wrote:

  Thank you Sandeep,

 I did post the document like that (a minor difference is that I did not
 add the boost to the field since I don't want to boost on specific
 field, I
 boosted the whole document 'doc boost=2.0  /doc'), but the
 issue
 is that everything in the queries results has the same score even if
 they
 had been indexed with different boosts, and I can't sort on another
 field
 since this is independent from any field value.

 Any ideas ?

 On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

  Hi Oussama,

 This is explained very nicely on Solr Wiki..
 http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts
 http://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
 
 http://wiki.apache.org/solr/UpdateXmlMessages#Optional_**http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**
 attributes_for_.22add.22http:**//wiki.apache.org/solr/**
 UpdateXmlMessages#Optional_**attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22
 


 All you need to do is something similar to below..

  -

  add  doc boost=2.5field name=employeeId05991/
 field
 field name=office boost=2.0Bridgewater/field
 /doc/add



 What is not clear from your message is whether you need better scoring
 or
 better sorting. so, additionally, you can consider adding a secondary
 sort
 parameter for the docs having the same score.

RE: Regular expression in solr

2013-05-22 Thread Sagar Chaturvedi

@Oussama Thank you for your reply. Is it as simple as that? I mean no 
additional settings required?

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] 
Sent: Wednesday, May 22, 2013 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

You can write a regular expression query like this (you need to specify the 
regex between slashes / ) :

fieldName:/[rR]egular.*/

On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:
 Hi,

 How do we search based upon regular expressions in solr?

 Regards,
 Sagar



 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and 
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its 
 affiliates. Any views or opinions presented in this email are solely 
 those of the author and may not necessarily reflect the opinions of 
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, 
 modification, distribution and / or publication of this message 
 without the prior written consent of the author of this e-mail is 
 strictly prohibited. If you have received this email in error please 
 delete it and notify the sender immediately. .
 --
 -




DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Re: Regular expression in solr

2013-05-22 Thread Oussama Jilal

I don't think so, it always worked for me without anything special, just 
try it and see :)


On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote:

@Oussama Thank you for your reply. Is it as simple as that? I mean no 
additional settings required?

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
Sent: Wednesday, May 22, 2013 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

You can write a regular expression query like this (you need to specify the 
regex between slashes / ) :

fieldName:/[rR]egular.*/

On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:

Hi,

How do we search based upon regular expressions in solr?

Regards,
Sagar



DISCLAIMER:
--
-
The contents of this e-mail and any attachment(s) are confidential and
intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of
NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message
without the prior written consent of the author of this e-mail is
strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. .
--
-




DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

Re: Boosting Documents

2013-05-22 Thread Oussama Jilal

Yes I did debug it and there is nothing special about it, everything is 
treated the same,


My Solr version is 4.2

The copy field is used because the 2 field are of different types but 
only one value is indexed in them (so no multiValue is required and it 
works perfectly).




On 05/22/2013 11:18 AM, Sandeep Mestry wrote:

Did you use the debugQuery=true in solr console to see how the query is
being interpreted and the result calculation?

Also, I'm not sure but this copyfield directive seems a bit confusing to
me..
copyField  source=Id  dest=Suggestion  /
Because multiValued is false for Suggestion field so does that schema mean
Suggestion has value only from Id and not from any other input?

You haven't mentioned the version of Solr, can you also post the query
params?



On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote:


I don't know if this can help (since the document boost should be
independent of any schema) but here is my schema :

|?xml version=1.0 encoding=UTF-8?
schema  name=  version=1.5
 types
 fieldType  name=string  class=solr.StrField
  sortMissingLast=true  /
 fieldType  name=long  class=solr.TrieLongField
  sortMissingLast=true  precisionStep=0  positionIncrementGap=0  /
 fieldType  name=text  class=solr.TextField
  sortMissingLast=true  omitNorms=true
 analyzer  type=index
 tokenizer  class=solr.**KeywordTokenizerFactory
  /
 filter  class=solr.**LowerCaseFilterFactory  /
 filter  class=solr.**EdgeNGramFilterFactory
  maxGramSize=255  /
 /analyzer
 analyzer  type=query
 tokenizer  class=solr.**KeywordTokenizerFactory
  /
 filter  class=solr.**LowerCaseFilterFactory  /
 /analyzer
 /fieldType
 /types
 fields
 field  name=Id  type=string  indexed=true
  stored=true  multiValued=false  required=true  /
 field  name=Suggestion  type=text  indexed=true
  stored=true  multiValued=false  required=false  /
 field  name=Type  type=string  indexed=true
  stored=true  multiValued=false  required=true  /
 field  name=Sections  type=string  indexed=true
  stored=true  multiValued=true  required=false  /
 field  name=_version_  type=long  indexed=true
  stored=true/
 /fields
 copyField  source=Id  dest=Suggestion  /
 uniqueKeyId/uniqueKey
 defaultSearchField**Suggestion/**defaultSearchField
/schema|

My query is somthing like : Suggestion:Olive Oil.

The result is 9 documents, wich all has the same score 11.287682, even
if they had been indexed with different boosts (I am sure of this).




On 05/22/2013 10:54 AM, Sandeep Mestry wrote:


I think that is applicable only for the field level boosting and not at
document level boosting.

Can you post your query, field definition and results you're expecting.

I am using index and query time boosting without any issues so far. also
which version of Solr you're using?


On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

  I don't know if this is the issue or not but, concidering this note from

the wiki :

NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
for any fields where the index-time boost should be stored.

In my case where I only need to boost the whole document (not a specific
field), do I have to activate the  omitNorms=false  for all the
fields in the schema ?




On 05/22/2013 10:41 AM, Oussama Jilal wrote:

  Thank you Sandeep,

I did post the document like that (a minor difference is that I did not
add the boost to the field since I don't want to boost on specific
field, I
boosted the whole document 'doc boost=2.0  /doc'), but the
issue
is that everything in the queries results has the same score even if
they
had been indexed with different boosts, and I can't sort on another
field
since this is independent from any field value.

Any ideas ?

On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

  Hi Oussama,

This is explained very nicely on Solr Wiki..
http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts
http://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
http://wiki.apache.org/solr/UpdateXmlMessages#Optional_**http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_**
attributes_for_.22add.22http:**//wiki.apache.org/solr/**
UpdateXmlMessages#Optional_**attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22

All you need to do is something similar to below..

  -

  add  doc boost=2.5field name=employeeId05991/

RE: Regular expression in solr

2013-05-22 Thread Sagar Chaturvedi

Yes, it works for me too. But many times result is not as expected. Is there 
some guide on use of regex in solr?

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] 
Sent: Wednesday, May 22, 2013 4:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

I don't think so, it always worked for me without anything special, just try it 
and see :)

On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote:
 @Oussama Thank you for your reply. Is it as simple as that? I mean no 
 additional settings required?

 -Original Message-
 From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
 Sent: Wednesday, May 22, 2013 3:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Regular expression in solr

 You can write a regular expression query like this (you need to specify the 
 regex between slashes / ) :

 fieldName:/[rR]egular.*/

 On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:
 Hi,

 How do we search based upon regular expressions in solr?

 Regards,
 Sagar

 DISCLAIMER:
 -
 -
 -
 The contents of this e-mail and any attachment(s) are confidential 
 and intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its 
 affiliates. Any views or opinions presented in this email are solely 
 those of the author and may not necessarily reflect the opinions of 
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, 
 modification, distribution and / or publication of this message 
 without the prior written consent of the author of this e-mail is 
 strictly prohibited. If you have received this email in error please 
 delete it and notify the sender immediately. .
 -
 -
 -

 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and 
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its 
 affiliates. Any views or opinions presented in this email are solely 
 those of the author and may not necessarily reflect the opinions of 
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, 
 modification, distribution and / or publication of this message 
 without the prior written consent of the author of this e-mail is 
 strictly prohibited. If you have received this email in error please 
 delete it and notify the sender immediately. .
 --
 -

DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

synonym indexing in solr

2013-05-22 Thread Sagar Chaturvedi

Hi,

Since synonym searching has some limitations in solr, so I wanted to know the 
procedure of Synonym indexing in solr?
Please let me know if any guide is available for that.

Regards,
Sagar



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Re: Regular expression in solr

2013-05-22 Thread Oussama Jilal

I am not sure but I heard it works with the Java Regex engine (a little 
obvious if it is true ...), so any Java regex tutorial would help you.


On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote:

Yes, it works for me too. But many times result is not as expected. Is there 
some guide on use of regex in solr?

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
Sent: Wednesday, May 22, 2013 4:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

I don't think so, it always worked for me without anything special, just try it 
and see :)

On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote:

@Oussama Thank you for your reply. Is it as simple as that? I mean no 
additional settings required?

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
Sent: Wednesday, May 22, 2013 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

You can write a regular expression query like this (you need to specify the 
regex between slashes / ) :

fieldName:/[rR]egular.*/

On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:

Hi,

How do we search based upon regular expressions in solr?

Regards,
Sagar



DISCLAIMER:
-
-
-
The contents of this e-mail and any attachment(s) are confidential
and intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of
NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message
without the prior written consent of the author of this e-mail is
strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. .
-
-
-



DISCLAIMER:
--
-
The contents of this e-mail and any attachment(s) are confidential and
intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of
NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message
without the prior written consent of the author of this e-mail is
strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. .
--
-




DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

Re: synonym indexing in solr

2013-05-22 Thread Oussama Jilal


Hello,

I think that what is written about the SynonymFilterFactory in the wiki 
is well explained, so I will direct you there :


http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

On 05/22/2013 11:44 AM, Sagar Chaturvedi wrote:

Hi,

Since synonym searching has some limitations in solr, so I wanted to know the 
procedure of Synonym indexing in solr?
Please let me know if any guide is available for that.

Regards,
Sagar



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

Re: [custom data structure] aligned dynamic fields

2013-05-22 Thread Dmitry Kan

Jack,

Thanks for your response.

1. Flattening could be an option, although our scale and required
functionality (runtime non DocValues backed facets) is beyond what solr3
can handle (billions of docs). We have flattened the meta data at the
expense of over-generating solr documents. But to solve the problem I
have described via flattening would make big impact on the scalability and
price.

2. We have quite the opposite of what you have described about the dynamic
fields: there will be very few per document. I agree, that caution should
be taken here, as we have suffered (or should I say experienced) having
multivalued fields (the good thing is we never had to facet on them).

Any other options? Maybe someone can share their experience with dynamic
fields and discourage from pursuing this path?

Dmitry


On Mon, May 20, 2013 at 4:23 PM, Jack Krupansky j...@basetechnology.comwrote:

 Before you dive off the deep end and go crazy with dynamic fields, try a
 clean, simple, Solr-oriented static design. Yes, you CAN do an
 over-complicated design with dynamic fields, but that doesn't mean you
 should.

 In a single phrase, denormalize and flatten your design. Sure, that will
 lead to a lot of rows, but Solr and Lucene are designed to do well in that
 scenario.

 If you are still linking in terms of C Struct, go for a long walk or do
 SOMETHING else until you can get that idea out of your head. It is a
 sub-optimal approach for exploiting the power of Lucene and Solr.

 Stay with a static schema design until you hit... just stay with a static
 schema, period.

 Dynamic fields and multi-valued fields do have value, but only when used
 in moderation - small numbers. If you start down a design path and find
 that you are heavily dependent on dynamic fields and/or multi-valued fields
 with large numbers of values per document, that is feedback that your
 design needs to be denormalized and flattened further.

 -- Jack Krupansky

 -Original Message- From: Dmitry Kan
 Sent: Monday, May 20, 2013 7:06 AM
 To: solr-user@lucene.apache.org
 Subject: [custom data structure] aligned dynamic fields


 Hi all,

 Our current project requirement suggests that we should start storing
 custom data structures in solr index. The custom data structure would be an
 equivalent of C struct.

 The task is as follows.

 Suppose we have two types of fields, one is FieldName1 and the other
 FieldName2.

 Suppose also that we can have multiple pairs of these two fields on a
 document in Solr.

 That is, in notation of dynamic fields:

 doc1
 FieldName1_id1
 FieldName2_id1

 FieldName1_id2
 FieldName2_id2

 doc2
 FieldName1_id3
 FieldName2_id3

 FieldName1_id4
 FieldName2_id4

 FieldName1_id5
 FieldName2_id5

 etc

 What we would like to have is a value for the Field1_(some_unique_id) and a
 value for Field2_(some_unique_id) as input for search. That is we wouldn't
 care about the some_unique_id in some search scenarios. And the search
 would automatically iterate the pairs of dynamic fields and respect the
 pairings.

 I know it used to be so, that with dynamic fields a client must provide the
 dynamically generated field names coupled with their values up front when
 searching.

 What data structure / solution could be used as an alternative approach to
 help such a structured search?

 Thanks,

 Dmitry

Re: Boosting Documents

2013-05-22 Thread Sandeep Mestry

I'm running out of options now, can't really see the issue you're facing
unless the debug analysis is posted.
I think a thorough debugging is required from both application and solr
level.

If you want a customize scoring from Solr, you can also consider overriding
DefaultSimilarity implementation - but that'll be a separate issue.


On 22 May 2013 11:32, Oussama Jilal jilal.ouss...@gmail.com wrote:

 Yes I did debug it and there is nothing special about it, everything is
 treated the same,

 My Solr version is 4.2

 The copy field is used because the 2 field are of different types but only
 one value is indexed in them (so no multiValue is required and it works
 perfectly).




 On 05/22/2013 11:18 AM, Sandeep Mestry wrote:

 Did you use the debugQuery=true in solr console to see how the query is
 being interpreted and the result calculation?

 Also, I'm not sure but this copyfield directive seems a bit confusing to
 me..
 copyField  source=Id  dest=Suggestion  /
 Because multiValued is false for Suggestion field so does that schema mean
 Suggestion has value only from Id and not from any other input?

 You haven't mentioned the version of Solr, can you also post the query
 params?



 On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote:

  I don't know if this can help (since the document boost should be
 independent of any schema) but here is my schema :

 |?xml version=1.0 encoding=UTF-8?
 schema  name=  version=1.5
  types
  fieldType  name=string  class=solr.StrField
   sortMissingLast=true  /
  fieldType  name=long  class=solr.TrieLongField
   sortMissingLast=true  precisionStep=0  positionIncrementGap=0  /
  fieldType  name=text  class=solr.TextField
   sortMissingLast=true  omitNorms=true
  analyzer  type=index
  tokenizer  class=solr.
 KeywordTokenizerFactory
   /
  filter  class=solr.LowerCaseFilterFactory
  /
  filter  class=solr.
 EdgeNGramFilterFactory

   maxGramSize=255  /
  /analyzer
  analyzer  type=query
  tokenizer  class=solr.
 KeywordTokenizerFactory
   /
  filter  class=solr.LowerCaseFilterFactory
  /

  /analyzer
  /fieldType
  /types
  fields
  field  name=Id  type=string  indexed=true
   stored=true  multiValued=false  required=true  /
  field  name=Suggestion  type=text  indexed=true
   stored=true  multiValued=false  required=false  /
  field  name=Type  type=string  indexed=true
   stored=true  multiValued=false  required=true  /
  field  name=Sections  type=string  indexed=true
   stored=true  multiValued=true  required=false  /
  field  name=_version_  type=long  indexed=true
   stored=true/
  /fields
  copyField  source=Id  dest=Suggestion  /
  uniqueKeyId/uniqueKey
  defaultSearchFieldSuggestion/defaultSearchField

 /schema|

 My query is somthing like : Suggestion:Olive Oil.

 The result is 9 documents, wich all has the same score 11.287682, even
 if they had been indexed with different boosts (I am sure of this).




 On 05/22/2013 10:54 AM, Sandeep Mestry wrote:

  I think that is applicable only for the field level boosting and not at
 document level boosting.

 Can you post your query, field definition and results you're expecting.

 I am using index and query time boosting without any issues so far. also
 which version of Solr you're using?


 On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

   I don't know if this is the issue or not but, concidering this note
 from

 the wiki :

 NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
 for any fields where the index-time boost should be stored.

 In my case where I only need to boost the whole document (not a
 specific
 field), do I have to activate the  omitNorms=false  for all the
 fields in the schema ?




 On 05/22/2013 10:41 AM, Oussama Jilal wrote:

   Thank you Sandeep,

 I did post the document like that (a minor difference is that I did
 not
 add the boost to the field since I don't want to boost on specific
 field, I
 boosted the whole document 'doc boost=2.0  /doc'), but the
 issue
 is that everything in the queries results has the same score even if
 they
 had been indexed with different boosts, and I can't sort on another
 field
 since this is independent from any field value.

 Any ideas ?

 On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

   Hi Oussama,

 This is explained very nicely on Solr Wiki..
 http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**
 boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts

Re: Regular expression in solr

2013-05-22 Thread Stéphane Habett Roux

I just can't get the $ endpoint to work.

 I am not sure but I heard it works with the Java Regex engine (a little 
 obvious if it is true ...), so any Java regex tutorial would help you.
 
 On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote:
 Yes, it works for me too. But many times result is not as expected. Is there 
 some guide on use of regex in solr?
 
 -Original Message-
 From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
 Sent: Wednesday, May 22, 2013 4:00 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Regular expression in solr
 
 I don't think so, it always worked for me without anything special, just try 
 it and see :)
 
 On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote:
 @Oussama Thank you for your reply. Is it as simple as that? I mean no 
 additional settings required?
 
 -Original Message-
 From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
 Sent: Wednesday, May 22, 2013 3:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Regular expression in solr
 
 You can write a regular expression query like this (you need to specify the 
 regex between slashes / ) :
 
 fieldName:/[rR]egular.*/
 
 On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:
 Hi,
 
 How do we search based upon regular expressions in solr?
 
 Regards,
 Sagar
 
 
 
 DISCLAIMER:
 -
 -
 -
 The contents of this e-mail and any attachment(s) are confidential
 and intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in this email are solely
 those of the author and may not necessarily reflect the opinions of
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure,
 modification, distribution and / or publication of this message
 without the prior written consent of the author of this e-mail is
 strictly prohibited. If you have received this email in error please
 delete it and notify the sender immediately. .
 -
 -
 -
 
 
 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in this email are solely
 those of the author and may not necessarily reflect the opinions of
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure,
 modification, distribution and / or publication of this message
 without the prior written consent of the author of this e-mail is
 strictly prohibited. If you have received this email in error please
 delete it and notify the sender immediately. .
 --
 -
 
 
 
 DISCLAIMER:
 ---
 The contents of this e-mail and any attachment(s) are confidential and
 intended
 for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in
 this email are solely those of the author and may not necessarily reflect the
 opinions of NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, modification,
 distribution and / or publication of
 this message without the prior written consent of the author of this e-mail 
 is
 strictly prohibited. If you have
 received this email in error please delete it and notify the sender
 immediately. .
 ---
 

-- 
Stéphane Roux
hab...@habett.org
http://habett.net

Re: Boosting Documents

2013-05-22 Thread Oussama Jilal

Ok thank you for your help, I think I will have to treat the problem in 
another way even if it will complicate things for me.


thanks again

On 05/22/2013 11:51 AM, Sandeep Mestry wrote:

I'm running out of options now, can't really see the issue you're facing
unless the debug analysis is posted.
I think a thorough debugging is required from both application and solr
level.

If you want a customize scoring from Solr, you can also consider overriding
DefaultSimilarity implementation - but that'll be a separate issue.


On 22 May 2013 11:32, Oussama Jilal jilal.ouss...@gmail.com wrote:


Yes I did debug it and there is nothing special about it, everything is
treated the same,

My Solr version is 4.2

The copy field is used because the 2 field are of different types but only
one value is indexed in them (so no multiValue is required and it works
perfectly).




On 05/22/2013 11:18 AM, Sandeep Mestry wrote:


Did you use the debugQuery=true in solr console to see how the query is
being interpreted and the result calculation?

Also, I'm not sure but this copyfield directive seems a bit confusing to
me..
copyField  source=Id  dest=Suggestion  /
Because multiValued is false for Suggestion field so does that schema mean
Suggestion has value only from Id and not from any other input?

You haven't mentioned the version of Solr, can you also post the query
params?



On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote:

  I don't know if this can help (since the document boost should be

independent of any schema) but here is my schema :

 |?xml version=1.0 encoding=UTF-8?
 schema  name=  version=1.5
  types
  fieldType  name=string  class=solr.StrField
   sortMissingLast=true  /
  fieldType  name=long  class=solr.TrieLongField
   sortMissingLast=true  precisionStep=0  positionIncrementGap=0  /
  fieldType  name=text  class=solr.TextField
   sortMissingLast=true  omitNorms=true
  analyzer  type=index
  tokenizer  class=solr.
KeywordTokenizerFactory
   /
  filter  class=solr.LowerCaseFilterFactory
  /
  filter  class=solr.
EdgeNGramFilterFactory

   maxGramSize=255  /
  /analyzer
  analyzer  type=query
  tokenizer  class=solr.
KeywordTokenizerFactory
   /
  filter  class=solr.LowerCaseFilterFactory
  /

  /analyzer
  /fieldType
  /types
  fields
  field  name=Id  type=string  indexed=true
   stored=true  multiValued=false  required=true  /
  field  name=Suggestion  type=text  indexed=true
   stored=true  multiValued=false  required=false  /
  field  name=Type  type=string  indexed=true
   stored=true  multiValued=false  required=true  /
  field  name=Sections  type=string  indexed=true
   stored=true  multiValued=true  required=false  /
  field  name=_version_  type=long  indexed=true
   stored=true/
  /fields
  copyField  source=Id  dest=Suggestion  /
  uniqueKeyId/uniqueKey
  defaultSearchFieldSuggestion/defaultSearchField

 /schema|

My query is somthing like : Suggestion:Olive Oil.

The result is 9 documents, wich all has the same score 11.287682, even
if they had been indexed with different boosts (I am sure of this).




On 05/22/2013 10:54 AM, Sandeep Mestry wrote:

  I think that is applicable only for the field level boosting and not at

document level boosting.

Can you post your query, field definition and results you're expecting.

I am using index and query time boosting without any issues so far. also
which version of Solr you're using?


On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote:

   I don't know if this is the issue or not but, concidering this note
from


the wiki :

NOTE: make sure norms are enabled (omitNorms=false in the schema.xml)
for any fields where the index-time boost should be stored.

In my case where I only need to boost the whole document (not a
specific
field), do I have to activate the  omitNorms=false  for all the
fields in the schema ?




On 05/22/2013 10:41 AM, Oussama Jilal wrote:

   Thank you Sandeep,


I did post the document like that (a minor difference is that I did
not
add the boost to the field since I don't want to boost on specific
field, I
boosted the whole document 'doc boost=2.0  /doc'), but the
issue
is that everything in the queries results has the same score even if
they
had been indexed with different boosts, and I can't sort on another
field
since this is independent from any field value.

Any ideas ?

On 05/22/2013 10:30 AM, Sandeep Mestry wrote:

   Hi Oussama,


This is explained very nicely on Solr Wiki..

RE: synonym indexing in solr

2013-05-22 Thread Sagar Chaturvedi

Thanks. Already used it. Quite easy to setup. But it tells how to setup Synonym 
search. I am asking about synonym indexing.

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] 
Sent: Wednesday, May 22, 2013 4:18 PM
To: solr-user@lucene.apache.org
Subject: Re: synonym indexing in solr

Hello,

I think that what is written about the SynonymFilterFactory in the wiki is well 
explained, so I will direct you there :

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

On 05/22/2013 11:44 AM, Sagar Chaturvedi wrote:
 Hi,

 Since synonym searching has some limitations in solr, so I wanted to know the 
 procedure of Synonym indexing in solr?
 Please let me know if any guide is available for that.

 Regards,
 Sagar



 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and 
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its 
 affiliates. Any views or opinions presented in this email are solely 
 those of the author and may not necessarily reflect the opinions of 
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, 
 modification, distribution and / or publication of this message 
 without the prior written consent of the author of this e-mail is 
 strictly prohibited. If you have received this email in error please 
 delete it and notify the sender immediately. .
 --
 -




DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only. 
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in 
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates. 
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of 
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have 
received this email in error please delete it and notify the sender
immediately. .
---

Re: Regular expression in solr

2013-05-22 Thread Oussama Jilal

There is no ^ or $ in the solr regex since the regular expression will 
match tokens (not the complete indexed text). So the results you get 
will basicly depend on your way of indexing, if you use the regex on a 
tokenized field and that is not what you want, try to use a copy field 
wich is not tokenized and then use the regex on that one.


On 05/22/2013 11:53 AM, Stéphane Habett Roux wrote:

I just can't get the $ endpoint to work.


I am not sure but I heard it works with the Java Regex engine (a little obvious 
if it is true ...), so any Java regex tutorial would help you.

On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote:

Yes, it works for me too. But many times result is not as expected. Is there 
some guide on use of regex in solr?

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
Sent: Wednesday, May 22, 2013 4:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

I don't think so, it always worked for me without anything special, just try it 
and see :)

On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote:

@Oussama Thank you for your reply. Is it as simple as that? I mean no 
additional settings required?

-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
Sent: Wednesday, May 22, 2013 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

You can write a regular expression query like this (you need to specify the 
regex between slashes / ) :

fieldName:/[rR]egular.*/

On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:

Hi,

How do we search based upon regular expressions in solr?

Regards,
Sagar



DISCLAIMER:
-
-
-
The contents of this e-mail and any attachment(s) are confidential
and intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of
NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message
without the prior written consent of the author of this e-mail is
strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. .
-
-
-


DISCLAIMER:
--
-
The contents of this e-mail and any attachment(s) are confidential and
intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of
NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message
without the prior written consent of the author of this e-mail is
strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. .
--
-



DISCLAIMER:
---
The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the
opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is
strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-22 Thread Erick Erickson

Sandeep:

You need to be a little careful here, I second Shawn's comment that
you are mixing versions. You say you are using solr 4.0. But the jar
that ships with that is apache-solr-core-4.0.0.jar. Then you talk
about using solr-core, which is called solr-core-4.1.jar.

Maven is not officially supported, so grabbing some solr-core.jar
(with no apache) and doing _anything_ with it from a 4.0 code base is
not a good idea.

You can check out the 4.0 code branch and just compile the whole
thing. Or you can get a new 4.0 distro and use the jars there. But I'd
be _really_ cautious about using a 4.1 or later jar with 4.0.

FWIW,
Erick

On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry sanmes...@gmail.com wrote:
 Thanks Steve,

 I could find solr-core.jar in the repo but could not find
 apache-solr-core.jar.
 I think my issue got misunderstood - which is totally my fault.

 Anyway, I took into account Shawn's comment and will use solr-core.jar only
 for compiling the project - not for deploying.

 Thanks,
 Sandeep


 On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote:

 The 4.0 solr-core jar is available in Maven Central: 
 http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
 

 Steve

 On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com wrote:

  Hi Steve,
 
  Solr 4.0 - mentioned in the subject.. :-)
 
  Thanks,
  Sandeep
 
 
  On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote:
 
  Sandeep,
 
  What version of Solr are you using?
 
  Steve
 
  On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com
 wrote:
 
  Hi Shawn,
 
  Thanks for your reply.
 
  I'm not mixing versions.
  The problem I faced is I want to override Highlighter from solr-core
 jar
  and if I add that as a dependency in my project then there was a clash
  between solr-core.jar and the apache-solr-core.jar that comes bundled
  within the solr distribution. It was complaining about
  MorfologikFilterFactory
  classcastexception.
  I can't use apache-solr-core.jar as a dependency as no such jar exists
 in
  any maven repo.
 
  The only thing I could do is to remove apache-solr-core.jar from
 solr.war
  and then use solr-core.jar as a dependency - however I do not think
 this
  is
  the ideal solution.
 
  Thanks,
  Sandeep
 
 
  On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote:
 
  On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
  And I do remember the discussion on the forum about dropping the name
  *apache* from solr jars. If that's what caused this issue, then can
 you
  tell me if the mirrors need updating with solr-core.jar instead of
  apache-solr-core.jar?
 
  If it's named apache-solr-core, then it's from 4.0 or earlier.  If
 it's
  named solr-core, then it's from 4.1 or later.  That might mean that
 you
  are mixing versions - don't do that.  Make sure that you have jars
 from
  the exact same version as your server.
 
  Thanks,
  Shawn

Re: Upgrade Solr index from 4.0 to 4.2.1

2013-05-22 Thread Erick Erickson

LUCENE_40 since your original index was built with 4.0.

As for the other, I'll defer to people who actually know what they're
talking about.

Best
Erick

On Wed, May 22, 2013 at 5:19 AM, Elran Dvir elr...@checkpoint.com wrote:
 My index is originally of version 4.0. My methods failed with this 
 configuration.
 So, I changed  solrconfig.xml  in my index to both versions: LUCENE_42 and 
 LUCENE_41.
 For each version in each method (loading and IndexUpgrader), I see the same 
 errors as before.

 Thanks.

 -Original Message-
 From: Elran Dvir
 Sent: Tuesday, May 21, 2013 6:48 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Upgrade Solr index from 4.0 to 4.2.1

 Why LUCENE_42?Why not LUCENE_41?
 Do I still need to run IndexUpgrader or just loading will be enough?

 Thanks.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Tuesday, May 21, 2013 2:52 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Upgrade Solr index from 4.0 to 4.2.1

 This is always something that gives me a headache, but what happens if you 
 change luceneMatchVersion in solrconfig.xml to LUCENE_40? I'm assuming it's 
 LUCENE_42...

 Best
 Erick

 On Tue, May 21, 2013 at 5:48 AM, Elran Dvir elr...@checkpoint.com wrote:
 Hi all,

 I have a 4.0 Solr (sharded/cored) index.
 I upgraded Solr to 4.2.1 and tried to load the existing index with it. I got 
 the following exception:

 May 21, 2013 12:03:42 PM org.apache.solr.common.SolrException log
 SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: 
 other_2013-05-04
 at 
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672)
 at 
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057)
 at 
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
 at 
 org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345)
 at java.util.concurrent.FutureTask.run(FutureTask.java:177)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)
 at 
 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345)
 at java.util.concurrent.FutureTask.run(FutureTask.java:177)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
 at java.lang.Thread.run(Thread.java:779)
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at org.apache.solr.core.SolrCore.init(SolrCore.java:822)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
 at 
 org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
 ... 10 more
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at 
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435)
 at 
 org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:797)
 ... 13 more
 Caused by: org.apache.solr.common.SolrException: Error opening Reader
 at 
 org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:183)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:179)
 at 
 org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411)
 ... 15 more
 Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: 
 actual codec=Lucene40StoredFieldsIndex vs expected 
 codec=Lucene41StoredFieldsIndex (resource: 
 MMapIndexInput(path=/var/solr/multicore_solr/other_2013-05-04/data/index/_3gfk.fdx))
 at 
 org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:140)
 at 
 org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:130)
 at 
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:102)
 at 
 org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113)
 at 
 org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147)
 at 
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56)
 at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
 at

Re: [Solr 4.2.1] LotsOfCores - Can't query cores with loadOnStartup=true and transient=true

2013-05-22 Thread Erick Erickson

Thanks, I saw that and assigned it to myself. On the original form
when you create the issue, there's an assign to entry field, but I
don't know whether you see the same thing

Best
Erick

On Wed, May 22, 2013 at 5:36 AM, Lyuba Romanchuk
lyuba.romanc...@gmail.com wrote:
 Hi Erick,

 I opened an issue in JIRA: SOLR-4850. But I don't see how to change an
 assignee, I don't think that I have permissions to do it.


 Thank you.
 Best regards,
 Lyuba


 On Mon, May 20, 2013 at 6:05 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Lyuba:

 Could you go ahead and raise a JIRA and assign it to me to
 investigate? You should definitely be able to define cores this way.

 Thanks,
 Erick

 On Sun, May 19, 2013 at 9:27 AM, Lyuba Romanchuk
 lyuba.romanc...@gmail.com wrote:
  Hi,
 
  It seems like in order to query transient cores they must be defined with
  loadOnStartup=false.
 
  I define one core loadOnStartup=true and transient=false, and another
  cores to be  loadOnStartup=true and transient=true, and
  transientCacheSize=Integer.MAX_VALUE.
 
  In this case CoreContainer.dynamicDescriptors will be empty and then
  CoreContainer.getCoreFromAnyList(String) and
 CoreContainer.getCore(String)
  returns null for all transient cores.
 
  I looked at the code of 4.3.0 and it doesn't seem that the flow was
  changed, the core is added only if it's not loaded on start up.
 
  Could you please assist with this issue?
 
  Best regards,
  Lyuba

Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core

2013-05-22 Thread Sandeep Mestry

Thanks Erick for your suggestion.

Turns out I won't be going that route after all as the highlighter
component is quite complicated - to follow and to override - and not much
time left in hand so did it the manual (dirty) way.

Beat Regards,
Sandeep


On 22 May 2013 12:21, Erick Erickson erickerick...@gmail.com wrote:

 Sandeep:

 You need to be a little careful here, I second Shawn's comment that
 you are mixing versions. You say you are using solr 4.0. But the jar
 that ships with that is apache-solr-core-4.0.0.jar. Then you talk
 about using solr-core, which is called solr-core-4.1.jar.

 Maven is not officially supported, so grabbing some solr-core.jar
 (with no apache) and doing _anything_ with it from a 4.0 code base is
 not a good idea.

 You can check out the 4.0 code branch and just compile the whole
 thing. Or you can get a new 4.0 distro and use the jars there. But I'd
 be _really_ cautious about using a 4.1 or later jar with 4.0.

 FWIW,
 Erick

 On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry sanmes...@gmail.com
 wrote:
  Thanks Steve,
 
  I could find solr-core.jar in the repo but could not find
  apache-solr-core.jar.
  I think my issue got misunderstood - which is totally my fault.
 
  Anyway, I took into account Shawn's comment and will use solr-core.jar
 only
  for compiling the project - not for deploying.
 
  Thanks,
  Sandeep
 
 
  On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote:
 
  The 4.0 solr-core jar is available in Maven Central: 
 
 http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar
  
 
  Steve
 
  On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com
 wrote:
 
   Hi Steve,
  
   Solr 4.0 - mentioned in the subject.. :-)
  
   Thanks,
   Sandeep
  
  
   On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote:
  
   Sandeep,
  
   What version of Solr are you using?
  
   Steve
  
   On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com
  wrote:
  
   Hi Shawn,
  
   Thanks for your reply.
  
   I'm not mixing versions.
   The problem I faced is I want to override Highlighter from solr-core
  jar
   and if I add that as a dependency in my project then there was a
 clash
   between solr-core.jar and the apache-solr-core.jar that comes
 bundled
   within the solr distribution. It was complaining about
   MorfologikFilterFactory
   classcastexception.
   I can't use apache-solr-core.jar as a dependency as no such jar
 exists
  in
   any maven repo.
  
   The only thing I could do is to remove apache-solr-core.jar from
  solr.war
   and then use solr-core.jar as a dependency - however I do not think
  this
   is
   the ideal solution.
  
   Thanks,
   Sandeep
  
  
   On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote:
  
   On 5/20/2013 8:01 AM, Sandeep Mestry wrote:
   And I do remember the discussion on the forum about dropping the
 name
   *apache* from solr jars. If that's what caused this issue, then
 can
  you
   tell me if the mirrors need updating with solr-core.jar instead of
   apache-solr-core.jar?
  
   If it's named apache-solr-core, then it's from 4.0 or earlier.  If
  it's
   named solr-core, then it's from 4.1 or later.  That might mean that
  you
   are mixing versions - don't do that.  Make sure that you have jars
  from
   the exact same version as your server.
  
   Thanks,
   Shawn

Solr Faceting doesn't return values.

2013-05-22 Thread samabhiK

Hello,

I have a field defined in my schema.xml like so:

field name=sa_site_city type=string indexed=true stored=true/

string is a type :

fieldType name=string class=solr.StrField sortMissingLast=true /

When I run the query for faceting data by the city:

http://XX.XX.XX.XX/solr/collection1/select?q=mm_state_codewt=jsonindent=truefacet=truefacet.field=sa_site_city

 I get empty result like so:

{
  responseHeader:{
status:0,
QTime:1,
params:{
  facet:true,
  indent:true,
  q:mm_state_code,
  facet.field:sa_site_city,
  wt:json}},
  response:{numFound:0,start:0,docs:[]
  },
  facet_counts:{
facet_queries:{},
facet_fields:{
  sa_site_city:[]},
facet_dates:{},
facet_ranges:{}}}

I wonder what am I doing wrong?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr starting time takes too long

2013-05-22 Thread Erick Erickson

Zhang:

In 3.6, there's really no choice except to load all the cores on
startup. 10 minutes still seems excessive, do you perhaps have a
heavy-weight firstSearcher query?

Yes, soft commits are 4.x only, so that's not your problem.

There's a shareSchema option that tries to only load 1 copy of the
schema that should help, but that doesn't help with loading
solrconfig.xml.

Also in the 4.3+ world there's the option to lazily-load cores, see:
http://wiki.apache.org/solr/LotsOfCores for the overview. Perhaps not
an option, but I thought I'd mention it.

But I'm afraid you're stuck. You might be able to run bigger hardware
(perhaps you're memory-starved). Other than that, you may need to use
more than one machine to get fast enough startup times.

Best,
Erick

On Wed, May 22, 2013 at 3:27 AM, Zhang, Lisheng
lisheng.zh...@broadvision.com wrote:
 Thanks very much for quick helps! I searched but it seems that
 autoSoftCommit is solr 4x feature and we are still using 3.6.1?

 Best regards, Lisheng

 -Original Message-
 From: Carlos Bonilla [mailto:carlosbonill...@gmail.com]
 Sent: Wednesday, May 22, 2013 12:17 AM
 To: solr-user@lucene.apache.org
 Subject: Re: solr starting time takes too long


 Hi Lisheng,
 I had the same problem when I enabled the autoSoftCommit in
 solrconfig.xml. If you have it enabled, disabling it could fix your problem,

 Cheers.
 Carlos.


 2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com


 Hi,

 We are using solr 3.6.1, our application has many cores (more than 1K),
 the problem is that solr starting took a long time (10m). Examing log
 file and code we found that for each core we loaded many resources, but
 in our app, we are sure we are always using the same solrconfig.xml and
 schema.xml for all cores. While we can config schema.xml to be shared,
 we cannot share SolrConfig object. But looking inside SolrConfig code,
 we donot use any of the cache.

 Could we somehow change config (or source code) to share resource between
 cores to reduce solr starting time?

 Thanks very much for helps, Lisheng

Re: synonym indexing in solr

2013-05-22 Thread Erick Erickson

Look at the text_general type (solr 4.x) in the example schema.xml.
That has an example of including synonyms at index time (although it
it commented out, but you can get the idea). So to substitute synonyms
at index time, just uncomment the index time analyzer mention of
synonyms and comment out the one in the query time analysis chain.

Be cautious about doing synonym expansion at both index and query
time. It's perfectly legal but often not what you want if you use the
same synonym list.

Best
Erick

On Wed, May 22, 2013 at 7:02 AM, Sagar Chaturvedi
sagar.chaturv...@nectechnologies.in wrote:
 Thanks. Already used it. Quite easy to setup. But it tells how to setup 
 Synonym search. I am asking about synonym indexing.

 -Original Message-
 From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
 Sent: Wednesday, May 22, 2013 4:18 PM
 To: solr-user@lucene.apache.org
 Subject: Re: synonym indexing in solr

 Hello,

 I think that what is written about the SynonymFilterFactory in the wiki is 
 well explained, so I will direct you there :

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

 On 05/22/2013 11:44 AM, Sagar Chaturvedi wrote:
 Hi,

 Since synonym searching has some limitations in solr, so I wanted to know 
 the procedure of Synonym indexing in solr?
 Please let me know if any guide is available for that.

 Regards,
 Sagar



 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in this email are solely
 those of the author and may not necessarily reflect the opinions of
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure,
 modification, distribution and / or publication of this message
 without the prior written consent of the author of this e-mail is
 strictly prohibited. If you have received this email in error please
 delete it and notify the sender immediately. .
 --
 -




 DISCLAIMER:
 ---
 The contents of this e-mail and any attachment(s) are confidential and
 intended
 for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in
 this email are solely those of the author and may not necessarily reflect the
 opinions of NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, modification,
 distribution and / or publication of
 this message without the prior written consent of the author of this e-mail is
 strictly prohibited. If you have
 received this email in error please delete it and notify the sender
 immediately. .
 ---

Re: Solr Faceting doesn't return values.

2013-05-22 Thread Erick Erickson

Probably you're not querying the field you think you are. Try adding
debug=all to the URL and I think you'll see something like

default_search_field:mm_state_code

Which means you're searching for the literal phrase mm_state_code in
your default search field (defined in solrconfig.xml for the handler
you're using).

You won't get any facets if you don't have any documents that match.

Best
Erick

On Wed, May 22, 2013 at 7:42 AM, samabhiK qed...@gmail.com wrote:
 Hello,

 I have a field defined in my schema.xml like so:

 field name=sa_site_city type=string indexed=true stored=true/

 string is a type :

 fieldType name=string class=solr.StrField sortMissingLast=true /

 When I run the query for faceting data by the city:

 http://XX.XX.XX.XX/solr/collection1/select?q=mm_state_codewt=jsonindent=truefacet=truefacet.field=sa_site_city

  I get empty result like so:

 {
   responseHeader:{
 status:0,
 QTime:1,
 params:{
   facet:true,
   indent:true,
   q:mm_state_code,
   facet.field:sa_site_city,
   wt:json}},
   response:{numFound:0,start:0,docs:[]
   },
   facet_counts:{
 facet_queries:{},
 facet_fields:{
   sa_site_city:[]},
 facet_dates:{},
 facet_ranges:{}}}

 I wonder what am I doing wrong?





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276.html
 Sent from the Solr - User mailing list archive at Nabble.com.

too many boolean clauses

2013-05-22 Thread adm1n

I got:
SyntaxError: Cannot parse
'name:Bbbbm'

Using solr 4.21
name field type def:

fieldType name=text_general class=solr.TextField
positionIncrementGap=100 
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=1 preserveOriginal=1
types=characters.txt /
filter class=solr.NGramTokenizerFactory minGramSize=2
maxGramSize=15/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=1 preserveOriginal=1
types=characters.txt /
filter class=solr.NGramTokenizerFactory minGramSize=2
maxGramSize=15/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldType


Any ideas how to fix it?

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288.html
Sent from the Solr - User mailing list archive at Nabble.com.

Sorting solr search results using multiple fields

2013-05-22 Thread Rohan Thakur

hi all

I wanted to know is there a way I can sort the my documents based on 3
fields
I have fields like pop(which is basically frequency of the term searched
history) and autosug(auto suggested words) and initial_boost(copy field of
autosug such that only match with initial term match having
whole sentence saved as one token)

Now I want the documents to be returned as:


   1. initial_boost with pop of 192
   2. initial_boost with pop of 156
   3. initial_boost with pop of 120
   4. autosug with pop of 205
   5. autosug with pop of 180
   6. autosug with pop of 112

I have tried using boosting the initial_boost field but without the sort it
does the above boost to the initial_boost than autosug but as I add sort=pop
desc documents gets sorted according to pop field disturbing the boost on
the fields that I had set.
help anyone...

thanks in advance.
regards
Rohan

Re: setting the collection in cloudsolrserver without using setdefaultcollection.

2013-05-22 Thread Shawn Heisey

On 5/21/2013 11:20 PM, mike st. john wrote:
 Is there any way to set the collection without passing setDefaultCollection
 in cloudsolrserver?
 
 I'm using cloudsolrserver with spring, and would like to autowire it.

It's a query parameter:

http://wiki.apache.org/solr/SolrCloud#Distributed_Requests

Here's how you do it in SolrJ:

  SolrQuery query = new SolrQuery();
  query.set(collection, collection3);

Thanks,
Shawn

Re: Sorting solr search results using multiple fields

2013-05-22 Thread Gora Mohanty

On 22 May 2013 18:26, Rohan Thakur rohan.i...@gmail.com wrote:
 hi all

 I wanted to know is there a way I can sort the my documents based on 3
 fields
 I have fields like pop(which is basically frequency of the term searched
 history) and autosug(auto suggested words) and initial_boost(copy field of
 autosug such that only match with initial term match having
 whole sentence saved as one token)
[...]

You seem to be confusing boosting with sorting. If you
sort the results, the boosts are irrelevant.

You can sort on multiple fields by separating them by
commas, as described under
http://wiki.apache.org/solr/CommonQueryParameters#sort

Regards,
Gora

Re: Crawl Anywhere -

2013-05-22 Thread Dominique Bejean


Hi,

I didn't see this question.

Yes, I confirm Crawl-Anywhere can crawl in distributed environment.
If you have several huge web sites to crawl, you can dispatch crawling 
across several crawler engines. However, one single web site can only be 
crawled by one crawler engine at a time.

This limitation should be removed in future version.

For your information, new version 4.0.0 is now available as an 
open-source project hosted on Github - 
https://github.com/bejean/crawl-anywhere


Regards.




Le 11/02/13 12:02, O. Klein a écrit :

Yes you can run CA on different machines.

In Manage you have to set target and engine for this to work.

I've never done this, so you have to contact the developer for more details.



SivaKarthik wrote

Hi All,
  in our project, we need to download around millions of pages...
  so is there any support to do the crawling in distributed environment
using crawl-anywhere apps?
   or wat could be the alternatives...?

  Thanks in advance..





--
View this message in context: 
http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039674.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Dominique Béjean
+33 6 08 46 12 43
skype: dbejean
www.eolya.fr
www.crawl-anywhere.com
www.mysolrserver.com

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean


Hi,

Crawl-Anywhere is now open-source - https://github.com/bejean/crawl-anywhere

Best regards.


Le 02/03/11 10:02, findbestopensource a écrit :

Hello Dominique Bejean,

Good job.

We identified almost 8 open source web crawlers 
http://www.findbestopensource.com/tagged/webcrawler   I don't know how 
far yours would be different from the rest.


Your license states that it is not open source but it is free for 
personnel use.


Regards
Aditya
www.findbestopensource.com http://www.findbestopensource.com


On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean 
dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr wrote:


Hi,

I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java
Web Crawler. It includes :

  * a crawler
  * a document processing pipeline
  * a solr indexer

The crawler has a web administration in order to manage web sites
to be crawled. Each web site crawl is configured with a lot of
possible parameters (no all mandatory) :

  * number of simultaneous items crawled by site
  * recrawl period rules based on item type (html, PDF, …)
  * item type inclusion / exclusion rules
  * item path inclusion / exclusion / strategy rules
  * max depth
  * web site authentication
  * language
  * country
  * tags
  * collections
  * ...

The pileline includes various ready to use stages (text
extraction, language detection, Solr ready to index xml writer, ...).

All is very configurable and extendible either by scripting or
java coding.

With scripting technology, you can help the crawler to handle
javascript links or help the pipeline to extract relevant title
and cleanup the html pages (remove menus, header, footers, ..)

With java coding, you can develop your own pipeline stage stage

The Crawl Anywhere web site provides good explanations and screen
shots. All is documented in a wiki.

The current version is 1.1.4. You can download and try it out from
here : www.crawl-anywhere.com http://www.crawl-anywhere.com


Regards

Dominique




--
Dominique Béjean
+33 6 08 46 12 43
skype: dbejean
www.eolya.fr
www.crawl-anywhere.com
www.mysolrserver.com

Re: Solr Faceting doesn't return values.

2013-05-22 Thread samabhiK

Ok after I added debug=all to the query, I get:

{
  responseHeader:{
status:0,
QTime:11,
params:{
  facet:true,
  indent:true,
  q:mm_state_code,
  debug:all,
  facet.field:sa_site_city,
  wt:json}},
  response:{numFound:0,start:0,docs:[]
  },
  facet_counts:{
facet_queries:{},
facet_fields:{
  sa_site_city:[]},
facet_dates:{},
facet_ranges:{}},
  debug:{
rawquerystring:mm_state_code,
querystring:mm_state_code,
parsedquery:sa_property_id:mm_state_code,
parsedquery_toString:sa_property_id:mm_state_code,
explain:{},
QParser:LuceneQParser,
timing:{
  time:4.0,
  prepare:{
time:2.0,
query:{
  time:0.0},
facet:{
  time:0.0},
mlt:{
  time:0.0},
highlight:{
  time:0.0},
stats:{
  time:0.0},
debug:{
  time:0.0}},
  process:{
time:1.0,
query:{
  time:0.0},
facet:{
  time:0.0},
mlt:{
  time:0.0},
highlight:{
  time:0.0},
stats:{
  time:0.0},
debug:{
  time:1.0}

I have not defined any default facet filed, in the handler in solrconfig.xml
file.
Also, there is plenty of data available and the field sa_site_city

What I am trying to understand is this:

parsedquery:sa_property_id:mm_state_code

I have a field sa_property_id in the schema but i have not defined it in the
query nor in solrconfig.xml, but why is it still evaluated? Any help in
solving this problem will be greatly appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065294.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [ANNOUNCE] Web Crawler

2013-05-22 Thread Dominique Bejean


Hi,

I did see this message (again). Please, use the new dedicated 
Crawl-Anywhere forum for your next questions.

https://groups.google.com/forum/#!forum/crawl-anywhere

Did you solve your problem ?

Thank you

Dominique



Le 29/01/13 09:28, SivaKarthik a écrit :

Hi,
  i resolved the issue Access denied for user 'crawler'@'localhost' (using
password: YES)
  
  mysql user crawler/crawler was created and privileges added as mentioned in

the tutorial..
  Thank you.
   




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4036978.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Dominique Béjean
+33 6 08 46 12 43
skype: dbejean
www.eolya.fr
www.crawl-anywhere.com
www.mysolrserver.com

search filter

2013-05-22 Thread Kamal Palei

Dear All
Can I write a search filter for a field having a value in a range or a
specific value.

Say if I want to have a filter like
1. Select profiles with salary 5 to 10  or Salary 0.

So I expect profiles having salary either 0 , 5, 6, 7, 8, 9, 10 etc.

It should be possible, can somebody help me with syntax of 'fq' filter
please.

Best Regards
kamal

Re: Crawl Anywhere -

2013-05-22 Thread Dominique Bejean


Hi,

Crawl-Anywhere includes a customizable document processing pipeline.
Crawl-Anywhere can also cache original crawled pages and documents in a 
mongodb database.


Best regards.

Dominique


Le 11/02/13 06:16, SivaKarthik a écrit :

Dear Erick,
Thanks for ur relpy..
ya..nutch can meet my requirement...
   but the problem is, i want to store the crawled document in html or xml
format instead of mapreduce format..
   not sure nutch plugins available to convert into xml files.
   please share me if you any idea .

ThankYou




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039619.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Dominique Béjean
www.crawl-anywhere.com

Re: Solr Faceting doesn't return values.

2013-05-22 Thread samabhiK

Ok my bad.

I do have a default field defined in the /select handler in the config file.

lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dfsa_property_id/str
/lst

But then how do I change my query now?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065298.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search filter

2013-05-22 Thread Rafał Kuć

Hello!

You can try sending a filter like this fq=Salary:[5+TO+10]+OR+Salary:0

It should work

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Dear All
 Can I write a search filter for a field having a value in a range or a
 specific value.

 Say if I want to have a filter like
 1. Select profiles with salary 5 to 10  or Salary 0.

 So I expect profiles having salary either 0 , 5, 6, 7, 8, 9, 10 etc.

 It should be possible, can somebody help me with syntax of 'fq' filter
 please.

 Best Regards
 kamal

Re: too many boolean clauses

2013-05-22 Thread Shawn Heisey

On 5/22/2013 6:43 AM, adm1n wrote:
 SyntaxError: Cannot parse
 'name:Bbbbm'

The subject mentions one error, the message says another. If you are
getting too many boolean clauses, then you need to increase the
maxBooleanClauses in your solrconfig.xml file.  The default is 1024:

maxBooleanClauses1024/maxBooleanClauses

Looking at your analyzer chain, I see two potential problems.

One is that you have two tokenizer factories, though one is specified as
a filter.  I don't know if you can use a tokenizer as a filter - you
might need NGramFilterFactory instead.

If using a tokenizer as a filter actually works, then we run into the
other possible problem: I can imagine that with the input you have
specified, the NGram expansion in your config might balloon that to more
than 1024 tokens, which would exceed the default maxBooleanClauses.

Thanks,
Shawn

Re: Sorting solr search results using multiple fields

2013-05-22 Thread Rohan Thakur

thanks gora I got that
one more thing
what actually I have done is made document consisting of fields:

{
autosug:galaxy,
query_id:1414,
pop:168,

initial_boost:galaxy

_version_:1435669695565922305,

score:1.8908522}

 this inital_boost is basically copy field of autosug but saved using
different analysers taking whole sentence as single token and generating
edge ngrams so that what I search on this field only term matching from
first will match...and for any other infix term match I have autosug
field

so now what I want from this is to show the documents returned with
initial_boost first and then the documents with autosug field sorted with
pop field respectively (separately) and return the result...

now from your suggestion I could do this using   sort on multiple fields by
separating them by
commas, as described under
http://wiki.apache.org/solr/CommonQueryParameters#sort

but for that I would require 1 field having value greater(all equal say 2)
for initial_boost field and smaller(all same say 1) for autosug field how
can I do this? or is there some better solution..

thanks
regards
Rohan


On Wed, May 22, 2013 at 6:39 PM, Gora Mohanty g...@mimirtech.com wrote:

 On 22 May 2013 18:26, Rohan Thakur rohan.i...@gmail.com wrote:
  hi all
 
  I wanted to know is there a way I can sort the my documents based on 3
  fields
  I have fields like pop(which is basically frequency of the term searched
  history) and autosug(auto suggested words) and initial_boost(copy field
 of
  autosug such that only match with initial term match having
  whole sentence saved as one token)
 [...]

 You seem to be confusing boosting with sorting. If you
 sort the results, the boosts are irrelevant.

 You can sort on multiple fields by separating them by
 commas, as described under
 http://wiki.apache.org/solr/CommonQueryParameters#sort

 Regards,
 Gora

RE: How do I use CachedSqlEntityProcessor?

2013-05-22 Thread O. Olson

Thank you bbarani. Unfortunately, this does not work. I do not get any
exception, and the documents import OK. However there is no Category1,
Category2 … etc. when I retrieve the documents.

I don’t think I am using the Alpha or Beta of 4.0. I think I downloaded the
plain vanilla release version. 
O. O.



bbarani wrote
 Try this..
 entity name=Cat1  
 query=SELECT CategoryName,SKU from CAT_TABLE WHERE
 CategoryLevel=1 cacheKey=Cat1.SKU cacheLookup=Product.SKU
 processor=CachedSqlEntityProcessor
 
 field column=CategoryName name=Category1 /
  
 
 /entity
 sample data import config:
 
   
 entity name=property query=select UID,name as name, value as value
 from opTable where type='${dataimporter.request.type}' and indexed='Y' 
 processor=CachedSqlEntityProcessor cacheKey=UID
 cacheLookup=object.uid
 transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer
   
   
 field column=value name=${property.name}/
  //dynamic column
   
 /entity
 
 Also not sure if you are using Alpha / Beta release of SOLR 4.0.
 
 In Solr 3.6, 3.6.1, 4.0-Alpha  4.0-Beta, the cacheKey parameter was
 re-named cachePk. This is renamed back for 4.0 ( 3.6.2, if released).
 See SOLR-3850





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065309.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: How do I use CachedSqlEntityProcessor?

2013-05-22 Thread Dyer, James

There was a mistake in my last reply.  Your child entities need to SELECT on 
the join key so DIH has it to do the join.  So use SELECT SKU, CategoryName...

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: O. Olson [mailto:olson_...@yahoo.it] 
Sent: Tuesday, May 21, 2013 5:06 PM
To: solr-user@lucene.apache.org
Subject: RE: How do I use CachedSqlEntityProcessor?

Thank you James  bbarani. 

This worked in the sense that there was no error or exception in the data
import. Unfortunately, I do not see any of my Category1, Category2 etc. when
I retrieve the documents. If I use the first configuration of the
db-data-config.xml posted in my original post, I see these fields in each
document. Doing an import with your suggestion of  

entity name=Cat1  
query=SELECT CategoryName from CAT_TABLE WHERE
CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU
processor=CachedSqlEntityProcessor
field column=CategoryName name=Category1 
/ 
/entity

I do not see Category1. 

I have not changed my schema.xml, so I don’t think this should affect the
results. For e.g. Category1 is declared as: 

field name=Category1 type=string indexed=true stored=true
multiValued=true/

I am curious to what I am doing wrong. I should mention that I am using Solr
4.0.0. I know a more recent version is out – but I don’t think it should
make a difference.
Thank you again for your help.
O. O.





Dyer, James-2 wrote
 First remove the where condition from the child entities, then use the
 cacheKey and cacheLookup parameters to instruct DIH how to do the
 join.
 
 Example:
 entity 
  name=Cat1 
  cacheKey=SKU
  cacheLookup=Product.SKU 
  query=SELECT CategoryName from CAT_TABLE where CategoryLevel=1 
 /
 See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
 , particularly the 3rd configuration option.
 
 James Dyer
 Ingram Content Group
 (615) 213-4311





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065091.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: too many boolean clauses

2013-05-22 Thread adm1n

first of all thanks for response!

Regarding two tokenizers - it's ok.
switching to NGramFilterFactory didn't help (though I didn't reindex but
don't think it was needed since switched it into 'query' section).

Now regarding the maxBooleanClauses - how it effects performance (response
times, memory usage) when increasing it?


thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288p4065314.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: too many boolean clauses

2013-05-22 Thread Shawn Heisey

 Now regarding the maxBooleanClauses - how it effects performance (response
 times, memory usage) when increasing it?

Changing maxBooleanClauses doesn't make any difference at all. Having
thousands of clauses is what makes things run slower and take more memory.
The setting just causes large queries to fail without running. If you need
a query with more than 1024 clauses and there's no other way to do the
job, then you have to increase it.

Thanks,
Shawn

Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

2013-05-22 Thread Justin Babuscio

*Problem:*

We periodically rebuild our Solr index from scratch.  We have built a
custom publisher that horizontally scales to increase write throughput.  On
a given rebuild, we will have ~60 JVMs running with 5 threads that are
actively publishing to all Solr masters.

For each thread, we instantiate one StreamingUpdateSolrServer(
QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread.

At the end of a publish cycle (we publish in smaller chunks = 5MM records),
we execute server.blockUntilFinished() on each of the 20 servers on each
thread ( 100 total ).  Before we applied a recent change, this would always
execute to completion.  There were a few hang-ups on publishes but we
consistently re-published our entire corpus in 6-7 hours.

The *problem* is that the blockUntilFinished hangs indefinitely.  From the
java thread dumps, it appears that the loop in StreamingUpdateSolrServer
thinks a runner thread is still active so it blocks (as expected).  The
other note about the java thread dump is that the active runner thread is
exactly this:


*Hung Runner Thread:*
pool-1-thread-8 prio=3 tid=0x0001084c nid=0xfe runnable
[0x5c7fe000]
java.lang.Thread.State: RUNNABLE
 at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 - locked 0xfffe81dbcbe0 (a java.io.BufferedInputStream)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
 at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
 at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
 at
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
 at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154)


Although the runner thread is reading the socket, there is absolutely no
activity on the Solr clients.  Other than the blockUntilFinished thread,
the client is basically sleeping.

*
*
*
*
***Recent Change:*

We increased the maxFieldLength from 1(default) to 2147483647
(Integer.MAX_VALUE).

Given this change is server side, I don't know how this would impact adding
a new document.  I see how it would increase commit times and index size,
but don't see the relationship to hanging client adds.


*Ingest Workflow:*

1) Pull artifacts from relational database (PDF/TXT/Java bean)
2) Extract all searchable text fields -- this is where we use Tika,
independent of Solr
3) Using Solr4J client, we publish an object that is serialized to XML and
written to the master
4) execute blockUntilFinished for all 20 servers on each thread.

5) Autocommit set on servers at 30 minutes or 50k documents.  During
republish, 50k threshold is met first.

*
*
*Environment:*

Solr v3.5.0
20 masters
2 slaves/master = 40 slaves


*Corpus:*

We have ~100MM records, ranging in size from 50MB PDFs to 1KB TXT files.
 Our schema has an unusually large number of fields, 200.  Our index size
averages about 30GB/shards, totally 600GB.


*Releated Bugs:*

My symptoms are most related to this bug but we are not executing any
deletes so I have low confidence that it is 100% related
https://issues.apache.org/jira/browse/SOLR-1990


Although we have similar stack traces, we are only ADDING docs.


Thanks ahead for any input/help!

-- 
Justin Babuscio

RE: How do I use CachedSqlEntityProcessor?

2013-05-22 Thread O. Olson

Thank you very much James. Your suggestion worked exactly! I am curious why I
did not get any errors before. For others, the following worked for me: 

entity name=Cat1  
query=SELECT CategoryName, SKU from CAT_TABLE WHERE
CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU
processor=CachedSqlEntityProcessor
field column=CategoryName name=Category1 
/ 
/entity

Similarly for other Categories i.e. Category2, Category3, etc. 

I am now going to try this for a larger dataset. I hope this works.
O.O.


Dyer, James-2 wrote
 There was a mistake in my last reply.  Your child entities need to SELECT
 on the join key so DIH has it to do the join.  So use SELECT SKU,
 CategoryName...
 
 James Dyer
 Ingram Content Group
 (615) 213-4311





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How do I use CachedSqlEntityProcessor?

2013-05-22 Thread Stefan Matheis

 I am curious why I did not get any errors before.
Because there was no (syntax) error before - the fact that you didn't include a 
SKU (but using it as cacheKey) just doesn't match anything .. therefore you got 
nothing added to your documents.

Perhaps we should add an ticket as improvement for that, to issue a 
notice/warning if the result set itself doesn't contain the cacheKey? WDYT 
James?

Stefan 


On Wednesday, May 22, 2013 at 5:14 PM, O. Olson wrote:

 Thank you very much James. Your suggestion worked exactly! I am curious why I
 did not get any errors before. For others, the following worked for me: 
 
 entity name=Cat1 
 query=SELECT CategoryName, SKU from CAT_TABLE WHERE
 CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU
 processor=CachedSqlEntityProcessor
 field column=CategoryName name=Category1 / 
 /entity
 
 Similarly for other Categories i.e. Category2, Category3, etc. 
 
 I am now going to try this for a larger dataset. I hope this works.
 O.O.
 
 
 Dyer, James-2 wrote
  There was a mistake in my last reply. Your child entities need to SELECT
  on the join key so DIH has it to do the join. So use SELECT SKU,
  CategoryName...
  
  James Dyer
  Ingram Content Group
  (615) 213-4311
  
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).

Re: [custom data structure] aligned dynamic fields

2013-05-22 Thread Jack Krupansky

Although we are entering the era of Big Data, that does not mean there are 
no limits or restrictions on what a given technology can do.


Maybe you need to consider either a smaller scope for your project, or more 
limited features, or some other form of simplification.


Solr can do billions of documents - for a heavily sharded cluster, but you 
will have to work really hard to make that work well.


So, I can confirm, that maybe in this case, there is no free lunch - unless 
you are willing to strip down the project. Or, maybe we just need a deeper 
feel for what your data model is really trying to achieve.


Suggestion: Think about your data model again, and then try rephrasing it 
for this group. You have violated one cardinal rule of this group: you 
focused on a proposed solution rather than focusing our attention on the 
original problem you are trying to solve. That short-circuited our focus on 
really solving your problem.


-- Jack Krupansky

-Original Message- 
From: Dmitry Kan

Sent: Wednesday, May 22, 2013 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: [custom data structure] aligned dynamic fields

Jack,

Thanks for your response.

1. Flattening could be an option, although our scale and required
functionality (runtime non DocValues backed facets) is beyond what solr3
can handle (billions of docs). We have flattened the meta data at the
expense of over-generating solr documents. But to solve the problem I
have described via flattening would make big impact on the scalability and
price.

2. We have quite the opposite of what you have described about the dynamic
fields: there will be very few per document. I agree, that caution should
be taken here, as we have suffered (or should I say experienced) having
multivalued fields (the good thing is we never had to facet on them).

Any other options? Maybe someone can share their experience with dynamic
fields and discourage from pursuing this path?

Dmitry


On Mon, May 20, 2013 at 4:23 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Before you dive off the deep end and go crazy with dynamic fields, try a
clean, simple, Solr-oriented static design. Yes, you CAN do an
over-complicated design with dynamic fields, but that doesn't mean you
should.

In a single phrase, denormalize and flatten your design. Sure, that will
lead to a lot of rows, but Solr and Lucene are designed to do well in that
scenario.

If you are still linking in terms of C Struct, go for a long walk or do
SOMETHING else until you can get that idea out of your head. It is a
sub-optimal approach for exploiting the power of Lucene and Solr.

Stay with a static schema design until you hit... just stay with a static
schema, period.

Dynamic fields and multi-valued fields do have value, but only when used
in moderation - small numbers. If you start down a design path and find
that you are heavily dependent on dynamic fields and/or multi-valued 
fields

with large numbers of values per document, that is feedback that your
design needs to be denormalized and flattened further.

-- Jack Krupansky

-Original Message- From: Dmitry Kan
Sent: Monday, May 20, 2013 7:06 AM
To: solr-user@lucene.apache.org
Subject: [custom data structure] aligned dynamic fields


Hi all,

Our current project requirement suggests that we should start storing
custom data structures in solr index. The custom data structure would be 
an

equivalent of C struct.

The task is as follows.

Suppose we have two types of fields, one is FieldName1 and the other
FieldName2.

Suppose also that we can have multiple pairs of these two fields on a
document in Solr.

That is, in notation of dynamic fields:

doc1
FieldName1_id1
FieldName2_id1

FieldName1_id2
FieldName2_id2

doc2
FieldName1_id3
FieldName2_id3

FieldName1_id4
FieldName2_id4

FieldName1_id5
FieldName2_id5

etc

What we would like to have is a value for the Field1_(some_unique_id) and 
a

value for Field2_(some_unique_id) as input for search. That is we wouldn't
care about the some_unique_id in some search scenarios. And the search
would automatically iterate the pairs of dynamic fields and respect the
pairings.

I know it used to be so, that with dynamic fields a client must provide 
the

dynamically generated field names coupled with their values up front when
searching.

What data structure / solution could be used as an alternative approach to
help such a structured search?

Thanks,

Dmitry

filter query by string length or word count?

2013-05-22 Thread Sam Lee

I have schema.xml
field name=body type=text_en_html indexed=true stored=true
omitNorms=true/
...
fieldType name=text_en_html class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
/fieldType


how can I query docs whose body has more than 80 words (or 80 characters) ?

Re: Solr Faceting doesn't return values.

2013-05-22 Thread Sandeep Mestry

Hi There,

Not sure I understand your problem correctly, but is 'mm_state_code' a real
value or is it field name?
Also, as Erick pointed out above, the facets are not calculated if there
are no results. Hence you get no facets.

You have mentioned which facets you want but you haven't mentioned which
field you want to search against. That field should be defined in df
parameter instead of sa_property_id.

Can you post example solr document you're indexing?

-Sandeep


On 22 May 2013 14:28, samabhiK qed...@gmail.com wrote:

 Ok my bad.

 I do have a default field defined in the /select handler in the config
 file.

 lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfsa_property_id/str
 /lst

 But then how do I change my query now?




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065298.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

2013-05-22 Thread Shawn Heisey


On 5/22/2013 9:08 AM, Justin Babuscio wrote:

We periodically rebuild our Solr index from scratch.  We have built a
custom publisher that horizontally scales to increase write throughput.  On
a given rebuild, we will have ~60 JVMs running with 5 threads that are
actively publishing to all Solr masters.

For each thread, we instantiate one StreamingUpdateSolrServer(
QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread.


Looking over all your details, you might want to try first reducing the 
maxFieldLength to slightly below Integer.MAX_VALUE.  Try setting it to 2 
billion, or even something more modest, in the millions.  It's 
theoretically possible that the other value might be leading to an 
overflow somewhere.  I've been looking for evidence of this, nothing's 
turned up yet.


There MIGHT be bugs in the Apache Commons libraries that SolrJ uses. 
The next thing I would try is upgrading those component jars in your 
application's classpath - httpclient, commons-io, commons-codec, etc.


Upgrading to a newer SolrJ version is also a good idea.  Your notes 
imply that you are using the default XML request writer in SolrJ.  If 
that's true, you should be able to use a 4.3 SolrJ even with an older 
Solr version, which would give you a server object that's based on 
HttpComponents 4.x, where your current objects are based on HttpClient 
3.x.  You would need to make adjustments in your source code.  If you're 
not using the default XML request writer, you can get a similar change 
by using SolrJ 3.6.2.


IMHO you should switch to HttpSolrServer (CommonsHttpSolrServer in SolrJ 
3.5 and earlier).  StreamingUpdateSolrServer (and its replacement in 3.6 
and later, named ConcurrentUpdateSolrServer) has one glaring problem - 
it never informs the calling application about any errors that it 
encounters during indexing.  It lies to you, and tells you that 
everything has succeeded even when it doesn't.


The one advantage that SUSS/CUSS has over its Http sibling is that it is 
multi-threaded, so it can send updates concurrently.  You seem to know 
enough about how it works, so I'll just say that you don't need 
additional complexity that is not under your control and refuses to 
throw exceptions when an error occurs.  You already have a large-scale 
concurrent and multi-threaded indexing setup, so SolrJ's additional 
thread handling doesn't really buy you much.


Thanks,
Shawn

RE: How do I use CachedSqlEntityProcessor?

2013-05-22 Thread Dyer, James

That would be a worthy enhancement to do.  Always nice to give the user a 
warning when something is going to fail so they can troubleshoot better...

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Stefan Matheis [mailto:matheis.ste...@gmail.com] 
Sent: Wednesday, May 22, 2013 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: How do I use CachedSqlEntityProcessor?

 I am curious why I did not get any errors before.
Because there was no (syntax) error before - the fact that you didn't include a 
SKU (but using it as cacheKey) just doesn't match anything .. therefore you got 
nothing added to your documents.

Perhaps we should add an ticket as improvement for that, to issue a 
notice/warning if the result set itself doesn't contain the cacheKey? WDYT 
James?

Stefan 


On Wednesday, May 22, 2013 at 5:14 PM, O. Olson wrote:

 Thank you very much James. Your suggestion worked exactly! I am curious why I
 did not get any errors before. For others, the following worked for me: 
 
 entity name=Cat1 
 query=SELECT CategoryName, SKU from CAT_TABLE WHERE
 CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU
 processor=CachedSqlEntityProcessor
 field column=CategoryName name=Category1 / 
 /entity
 
 Similarly for other Categories i.e. Category2, Category3, etc. 
 
 I am now going to try this for a larger dataset. I hope this works.
 O.O.
 
 
 Dyer, James-2 wrote
  There was a mistake in my last reply. Your child entities need to SELECT
  on the join key so DIH has it to do the join. So use SELECT SKU,
  CategoryName...
  
  James Dyer
  Ingram Content Group
  (615) 213-4311
  
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html
 Sent from the Solr - User mailing list archive at Nabble.com 
 (http://Nabble.com).

Solr french search optimisation

2013-05-22 Thread It-forum


Hello to all,

I'm trying to setup solr 4.2 to index and search into french content.

I defined a special fieldtype for french content :

fieldType name=text_fr class=solr.TextField 
positionIncrementGap=100

analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=French protected=protwords.txt/

/analyzer

analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/

tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory 
language=French protected=protwords.txt/

/analyzer
/fieldType


unfortunately, this field does not behave as I wish.

I'd like to be able to get results from unwell spelled word.

IE : I wish to get the same result typing Pompe à chaleur than typing 
pomppe a chaler  or with solère and solaire


I'm do not find the right way to create a fieldtype to reach this aim.

thanks in advance for your help, do not hesitate for more information if 
need.


Regards

David

Re: Solr Faceting doesn't return values.

2013-05-22 Thread samabhiK

Thanks for your reply.

I have my request url modified like this:
http://xx.xx.xx.xx/solr/collection1/select?q=TXdf=mm_state_codewt=xmlindent=truefacet=truefacet.field=sa_site_citydebug=all

Facet Filed = sa_site_city ( city wise facet)
Default Filed = mm_state_code
Query= TX

When I run this query, I get something like this:

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime3/int
  lst name=params
str name=facettrue/str
str name=dfsa_site_city/str
str name=indenttrue/str
str name=qTX/str
str name=_1369238921109/str
str name=debugall/str
str name=facet.fieldsa_site_city/str
str name=wtxml/str
  /lst
/lst
result name=response numFound=0 start=0
/result
lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
lst name=sa_site_city/
  /lst
  lst name=facet_dates/
  lst name=facet_ranges/
/lst
lst name=debug
  str name=rawquerystringTX/str
  str name=querystringTX/str
  str name=parsedquerysa_site_city:TX/str
  str name=parsedquery_toStringsa_site_city:TX/str
  lst name=explain/
  str name=QParserLuceneQParser/str
  lst name=timing
double name=time2.0/double
lst name=prepare
  double name=time0.0/double
  lst name=query
double name=time0.0/double
  /lst
  lst name=facet
double name=time0.0/double
  /lst
  lst name=mlt
double name=time0.0/double
  /lst
  lst name=highlight
double name=time0.0/double
  /lst
  lst name=stats
double name=time0.0/double
  /lst
  lst name=debug
double name=time0.0/double
  /lst
/lst
lst name=process
  double name=time2.0/double
  lst name=query
double name=time1.0/double
  /lst
  lst name=facet
double name=time1.0/double
  /lst
  lst name=mlt
double name=time0.0/double
  /lst
  lst name=highlight
double name=time0.0/double
  /lst
  lst name=stats
double name=time0.0/double
  /lst
  lst name=debug
double name=time0.0/double
  /lst
/lst
  /lst
/lst
/response

I do have the data in my index and that I verified by running other queries.
I can't figure out what I am missing.








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065360.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: filter query by string length or word count?

2013-05-22 Thread Sandeep Mestry

I doubt if there is any straight out of the box feature that supports this
requirement, you will probably need to handle this at the index time.
You can play around with Function Queries
http://wiki.apache.org/solr/FunctionQuery for any such feature.



On 22 May 2013 16:37, Sam Lee skyn...@gmail.com wrote:

 I have schema.xml
 field name=body type=text_en_html indexed=true stored=true
 omitNorms=true/
 ...
 fieldType name=text_en_html class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPossessiveFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=stopwords_en.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPossessiveFilterFactory/
 filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
 filter class=solr.PorterStemFilterFactory/
 /analyzer
 /fieldType


 how can I query docs whose body has more than 80 words (or 80 characters) ?

Can anyone explain this Solr query behavior?

2013-05-22 Thread Shankar Sundararaju

This query returns 0 documents: *q=(+Title:() +Classification:()
+Contributors:() +text:())*

This returns 1 document: *q=doc-id:3000*

And this returns 631580 documents when I was expecting 0: *q=doc-id:3000
AND (+Title:() +Classification:() +Contributors:() +text:())*

Am I missing something here? Can someone please explain? I am using Solr
4.2.1

Thanks
-Shankar

Re: filter query by string length or word count?

2013-05-22 Thread Jason Hellman

Sam,

I would highly suggest counting the words in your external pipeline and sending 
that value in as a specific field.  It can then be queried quite simply with a:

wordcount:{80 TO *]

(Note the { next to 80, excluding the value of 80)

Jason

On May 22, 2013, at 11:37 AM, Sam Lee skyn...@gmail.com wrote:

 I have schema.xml
 field name=body type=text_en_html indexed=true stored=true
 omitNorms=true/
 ...
 fieldType name=text_en_html class=solr.TextField
 positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_en.txt
enablePositionIncrements=true
/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
/analyzer
 /fieldType
 
 
 how can I query docs whose body has more than 80 words (or 80 characters) ?

Re: Solr Faceting doesn't return values.

2013-05-22 Thread Sandeep Mestry

From the response you've mentioned it appears to me that the query term TX
is searched against sa_site_city instead of mm_state_code.
Can you try your query like below:

http://xx.xx.xx.xx/solr/collection1/select?q=*mm_state_code:(**TX)*
wt=xmlindent=truefacet=truefacet.field=sa_site_citydebug=all

and post your output?

On 22 May 2013 17:13, samabhiK qed...@gmail.com wrote:

 str name=dfsa_site_city/str

RE: How do I use CachedSqlEntityProcessor?

2013-05-22 Thread O. Olson

Thank you guys, particularly James, very much. I just imported 200K documents
in a little more than 2 mins – which is great for me :-). Thank you Stefan.
I did not realize that it was not a syntax error and hence no error. Thank
you for clearing that up. 
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065392.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: shard splitting

2013-05-22 Thread Yago Riveiro

You will need to edit it manually and upload using a zookeeper client, you can 
use kazoo, it's very easy to use. 

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, May 22, 2013 at 10:04 AM, Arkadi Colson wrote:

 clusterstate.json is now reporting shard3 as inactive. Any idea how to 
 change clusterstate.json manually from commandline?
 
 On 05/22/2013 08:59 AM, Arkadi Colson wrote:
  Hi
  
  
  I tried to split a shard but it failed. If I try to do it again it 
  does not start again.
  I see the to extra shards in /collections/messages/leader_elect/ and 
  /collections/messages/leaders/
  How can I fix this?
  
  
  root@solr07-dcg:/solr/messages_shard3_replica2# curl 
  'http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=messagesshard=shard3'
  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeaderint name=status500/intint 
  name=QTime300117/int/lstlst name=errorstr 
  name=msgsplitshard the collection time out:300s/strstr 
  name=traceorg.apache.solr.common.SolrException: splitshard the 
  collection time out:300s
  at 
  org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166)
  at 
  org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300)
  at 
  org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136)
  at 
  org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at 
  org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608)
  at 
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215)
  at 
  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
  at 
  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  at 
  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at 
  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
  at 
  org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
  at 
  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
  at 
  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
  at 
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
  at 
  org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at 
  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
  at 
  org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008)
  at 
  org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
  at 
  org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
  at 
  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
  /strint name=code500/int/lst
  /response
  
  INFO - 2013-05-22 06:45:54.148; 
  org.apache.solr.handler.admin.CoreAdminHandler; Invoked split action 
  for core: messages_shard3_replica1
  INFO - 2013-05-22 06:45:54.271; 
  org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: 
  partitions=2 segments=29
  INFO - 2013-05-22 06:46:03.240; 
  org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partition 
  #0 range=2aaa-5554
  
  
  BR
  Arkadi

RE: Speed up import of Hierarchical Data

2013-05-22 Thread O. Olson

Just an update for others reading this thread: I had some
CachedSqlEntityProcessor and had it addressed in the thread How do I use
CachedSqlEntityProcessor?
(http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-td4064919.html)

I basically had to declare the child entities in the db-data-config.xml
like: 

entity name=Cat1  
query=SELECT CategoryName, SKU from CAT_TABLE WHERE
CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU
processor=CachedSqlEntityProcessor
field column=CategoryName
name=Category1 / 
/entity

Thanks to James and others for their help.
O. O.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924p4065400.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

2013-05-22 Thread Justin Babuscio

Shawn,

Thank you!

Just some quick responses:

On your overflow theory, why would this impact the client?  Is is possible
that a write attempt to Solr would block indefinitely while the Solr server
is running wild or in a bad state due to the overflow?


We attempt to set the BinaryRequestWriter but per this bug:
https://issues.apache.org/jira/browse/SOLR-1565, v3.5 uses the default XML
writer.


On upgrading to 3.6.2 or 4.x, we have an organizational challenge that
requires approval of the software/upgrade.  I am promoting/supporting this
idea but cannot execute in the short-term.

For the mass publish, we originally used the CommonsHttpSolrServer (what we
use in live production updates) but we found the trade-off with performance
was quite large.  I really like your idea about KISS on threading.  Since
I'm already introducing complexity with all the multi-threading, why stress
the older 3.x software.  We may need to trade-off time for this.



My first tactics will be to adjust the maxFieldLength and toggle the
configuration to use CommonsHttpSolrServer.  I will follow-up with any
discoveries.

Thanks again,
Justin





On Wed, May 22, 2013 at 11:46 AM, Shawn Heisey s...@elyograg.org wrote:

 On 5/22/2013 9:08 AM, Justin Babuscio wrote:

 We periodically rebuild our Solr index from scratch.  We have built a
 custom publisher that horizontally scales to increase write throughput.
  On
 a given rebuild, we will have ~60 JVMs running with 5 threads that are
 actively publishing to all Solr masters.

 For each thread, we instantiate one StreamingUpdateSolrServer(
 QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread.


 Looking over all your details, you might want to try first reducing the
 maxFieldLength to slightly below Integer.MAX_VALUE.  Try setting it to 2
 billion, or even something more modest, in the millions.  It's
 theoretically possible that the other value might be leading to an overflow
 somewhere.  I've been looking for evidence of this, nothing's turned up yet.

 There MIGHT be bugs in the Apache Commons libraries that SolrJ uses. The
 next thing I would try is upgrading those component jars in your
 application's classpath - httpclient, commons-io, commons-codec, etc.

 Upgrading to a newer SolrJ version is also a good idea.  Your notes imply
 that you are using the default XML request writer in SolrJ.  If that's
 true, you should be able to use a 4.3 SolrJ even with an older Solr
 version, which would give you a server object that's based on
 HttpComponents 4.x, where your current objects are based on HttpClient 3.x.
  You would need to make adjustments in your source code.  If you're not
 using the default XML request writer, you can get a similar change by using
 SolrJ 3.6.2.

 IMHO you should switch to HttpSolrServer (CommonsHttpSolrServer in SolrJ
 3.5 and earlier).  StreamingUpdateSolrServer (and its replacement in 3.6
 and later, named ConcurrentUpdateSolrServer) has one glaring problem - it
 never informs the calling application about any errors that it encounters
 during indexing.  It lies to you, and tells you that everything has
 succeeded even when it doesn't.

 The one advantage that SUSS/CUSS has over its Http sibling is that it is
 multi-threaded, so it can send updates concurrently.  You seem to know
 enough about how it works, so I'll just say that you don't need additional
 complexity that is not under your control and refuses to throw exceptions
 when an error occurs.  You already have a large-scale concurrent and
 multi-threaded indexing setup, so SolrJ's additional thread handling
 doesn't really buy you much.

 Thanks,
 Shawn




-- 
Justin Babuscio
571-210-0035
http://linchpinsoftware.com

Re: Solr Faceting doesn't return values.

2013-05-22 Thread samabhiK

When I use your query, I get :

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status400/int
  int name=QTime12/int
  lst name=params
str name=facettrue/str
str name=dfmm_state_code/str
str name=indenttrue/str
str name=q*mm_state_code:(**TX)*/str
str name=_1369244078714/str
str name=debugall/str
str name=facet.fieldsa_site_city/str
str name=wtxml/str
  /lst
/lst
lst name=error
  str name=msgorg.apache.solr.search.SyntaxError: Cannot parse
'*mm_state_code:(**TX)*': Encountered  : :  at line 1, column 14.
Was expecting one of:
EOF 
AND ...
OR ...
NOT ...
+ ...
- ...
BAREOPER ...
( ...
* ...
^ ...
QUOTED ...
TERM ...
FUZZY_SLOP ...
PREFIXTERM ...
WILDTERM ...
REGEXPTERM ...
[ ...
{ ...
LPARAMS ...
NUMBER ...
/str
  int name=code400/int
/lst
/response

Not sure why the data wont show up. Almost all the records has the field
sa_site_city has data and is also indexed. :(  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065406.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

2013-05-22 Thread Shawn Heisey


On 5/22/2013 11:25 AM, Justin Babuscio wrote:

On your overflow theory, why would this impact the client?  Is is possible
that a write attempt to Solr would block indefinitely while the Solr server
is running wild or in a bad state due to the overflow?


That's the general notion.  I could be completely wrong about this, but 
as that limit is the only thing you changed, it was the idea that came 
to mind first.


One other thing I thought of, though this would be a band-aid, not a 
real solution - if there's a definable maximum amount of time that an 
individual update request should take to complete (1 minute? 5 minutes?) 
then you might be able to use the setSoTimeout call on your server 
object.  In the 3.5.0 source code, this method is inherited, so it might 
not actually work correctly, but I'm hopeful.


If the problem is stuck update requests (and not a bug in 
blockUntilFinished), setting the SoTimeout (assuming it works) might 
unplug the works.  The stuck requests might fail, but your SolrJ log 
might contain enough info to help you track that down.  I don't think 
your application would ever be notified about such failures, but they 
should be logged.


Good luck with the upgrade plan.  Would you be able to upgrade the 
dependent jars for the existing SolrJ without an extensive approval 
process?  I won't be surprised if the answer is no.


On SOLR-1990, I don't think that's it, because unless 
blockUntilFinished() itself is broken, calling it more often than 
strictly necessary shouldn't be an issue.


Do you see any problems in the server log?

Thanks,
Shawn

MoreLikeThis - No Results

2013-05-22 Thread Andy Pickler

I'm a developing a recommendation feature in our app using the
MoreLikeThisHandler http://wiki.apache.org/solr/MoreLikeThisHandler, and
so far it is doing a great job.  We're using a user's competency keywords
as the MLT field list and the user's corresponding document in Solr as the
comparison document.  I have found that for one user I'm not receiving
any recommendations, and I'm not sure why.

Solr: 4.1.0

*relevant schema*:

field name=competencyKeywords type=short-mlt-text indexed=true
stored=true multiValued=true termVectors=true/

fieldType name=short-mlt-text class=solr.TextField
positionIncrementGap=100 autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PorterStemFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldType

*user's values*:

arr name=competencyKeywords
strHealthcare Cost Trends/str
/arr

Is it possible that among all the ~40,000 users in this index (about 500 of
which have the same competency keywords), that the words healthcare,
cost and trends are just judged by Lucene to not be significant.  I
realize that I may not understand how the MLT Handler is doing things under
the covers...I've only been guessing until now based on the (otherwise
excellent) results I've been seeing.

Thanks,
Andy Pickler

P.S.  For some additional information, the following query:

/mlt?q=objectId:user91813mlt.fl=competencyKeywordsmlt.interestingTerms=detailsdebugQuery=truemlt.match.include=false

...produces the following results...

response
lst name=responseHeader
int name=status0/int
int name=QTime2/int
/lst
result name=response numFound=0 start=0/
lst name=interestingTerms/
lst name=debug
str name=rawquerystringobjectId:user91813/str
str name=querystringobjectId:user91813/str
str name=parsedquery/
str name=parsedquery_toString/
lst name=explain/
/lst
/response

hostname - ipaddress change in solr4.0 to solr4.1+

2013-05-22 Thread Anirudha Jadhav

Logging/UI used to show hostname in 4.0 in 4.1+ it switched to ip addresses

is this by design or a bug/side effect ?


its pretty painful to look at ip addresses, I am planning to change.

let me know if you have any concerns

--
Anirudha

Re: solr starting time takes too long

2013-05-22 Thread Chris Hostetter


: Subject: solr starting time takes too long
: In-Reply-To: 519c6cd6.90...@smartbit.be
: Thread-Topic: shard splitting

https://people.apache.org/~hossman/#threadhijack


-Hoss

Re: hostname - ipaddress change in solr4.0 to solr4.1+

2013-05-22 Thread Shawn Heisey


On 5/22/2013 12:53 PM, Anirudha Jadhav wrote:

Logging/UI used to show hostname in 4.0 in 4.1+ it switched to ip addresses

is this by design or a bug/side effect ?


If you are talking about SolrCloud, this was an intentional change.  By 
including a host property either on the Solr startup command or in 
solr.xml, you can force SolrCloud to use hostnames.


http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

If you aren't talking about SolrCloud, can you give specific examples of 
what you are seeing?


Thanks,
Shawn

RE: Slow Highlighter Performance Even Using FastVectorHighlighter

2013-05-22 Thread Andy Brown

After taking your advice on profiling, I didn't see any memory issues. I
wanted to verify this with a small set of data. So I created a new
sandbox core with the exact same schema and config file settings. I
indexed only 25 PDF documents with an average size of 2.8 MB, the
largest is approx 5 MB (39 pages). I run the exact same query on that
core and I'm seeing response times of 7 secs or more. Without
highlighting the response is usually 1 ms. 
 
I don't understand why it's taking 7 secs to return highlights. The size
of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set to
1024 for this verification purpose and that should be more than enough.
The processor is plenty powerful enough as well. 
 
Running VisualVM shows all my CPU time being taken by mainly these 3
methods: 
 
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
nfo.getStartOffset() 
org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI
nfo.getStartOffset() 
org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap(
) 
 
My guess is that this has something to do with how I'm handling partial
word matches/highlighting. I have setup another request handler that
only searches the whole word fields and it returns in 850 ms with
highlighting. 
 
Any ideas? 

- Andy


-Original Message-
From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] 
Sent: Monday, May 20, 2013 1:39 PM
To: solr-user@lucene.apache.org
Subject: RE: Slow Highlighter Performance Even Using
FastVectorHighlighter

My guess is that the problem is those 200M documents.
FastVectorHighlighter is fast at deciding whether a match, especially a
phrase, appears in a document, but it still starts out by walking the
entire list of term vectors, and ends by breaking the document into
candidate-snippet fragments, both processes that are proportional to the
length of the document.

It's hard to do much about the first, but for the second you could
choose
to expose FastVectorHighlighter's FieldPhraseList representation, and
return offsets to the caller rather than fragments, building up your own
snippets from a separate store of indexed files. This would also permit
you to set stored=false, improving your memory/core size ratio, which
I'm guessing could use some improving. It would require some work, and
it
would require you to store a representation of what was indexed outside
the Solr core, in some constant-bytes-to-character representation that
you
can use offsets with (e.g. UTF-16, or ASCII+entity references).

However, you may not need to do this -- it may be that you just need
more
memory for your search machine. Not JVM memory, but memory that the O/S
can use as a file cache. What do you have now? That is, how much memory
do
you have that is not used by the JVM or other apps, and how big is your
Solr core?

One way to start getting a handle on where time is being spent is to set
up VisualVM. Turn on CPU sampling, send in a bunch of the slow highlight
queries, and look at where the time is being spent. If it's mostly in
methods that are just reading from disk, buy more memory. If you're on
Linux, look at what top is telling you. If the CPU usage is low and the
wa number is above 1% more often than not, buy more memory (I don't
know
why that wa number makes sense, I just know that it has been a good rule
of thumb for us).

-- Bryan

 -Original Message-
 From: Andy Brown [mailto:andy_br...@rhoworld.com]
 Sent: Monday, May 20, 2013 9:53 AM
 To: solr-user@lucene.apache.org
 Subject: Slow Highlighter Performance Even Using FastVectorHighlighter

 I'm providing a search feature in a web app that searches for
documents
 that range in size from 1KB to 200MB of varying MIME types (PDF, DOC,
 etc). Currently there are about 3000 documents and this will continue
to
 grow. I'm providing full word search and partial word search. For each
 document, there are three source fields that I'm interested in
searching
 and highlighting on: name, description, and content. Since I'm
providing
 both full and partial word search, I've created additional fields that
 get tokenized differently: name_par, description_par, and content_par.
 Those are indexed and stored as well for querying and highlighting. As
 suggested in the Solr wiki, I've got two catch all fields text and
 text_par for faster querying.

 An average search results page displays 25 results and I provide
paging.
 I'm just returning the doc ID in my Solr search results and response
 times have been quite good (1 to 10 ms). The problem in performance
 occurs when I turn on highlighting. I'm already using the
 FastVectorHighlighter and depending on the query, it has taken as long
 as 15 seconds to get the highlight snippets. However, this isn't
always
 the case. Certain query terms result in 1 sec or less response time.
In
 any case, 15 seconds is way too long.

 I'm fairly new to Solr but I've spent days coming up with what I've
got
 so far. Feel free to

Re: MoreLikeThis - No Results

2013-05-22 Thread Andy Pickler

Answered my own question...

mlt.mintf: Minimum Term Frequency - the frequency below which terms will be
ignored in the source doc

Our source doc is a set of limited terms...not a large content field.  So
in our case I need to set that value to 1 (rather than the default of 2).
 Now I'm getting results...and they indeed are relevant.

Thanks,
Andy Pickler

On Wed, May 22, 2013 at 12:20 PM, Andy Pickler andy.pick...@gmail.comwrote:

 I'm a developing a recommendation feature in our app using the
 MoreLikeThisHandler http://wiki.apache.org/solr/MoreLikeThisHandler,
 and so far it is doing a great job.  We're using a user's competency
 keywords as the MLT field list and the user's corresponding document in
 Solr as the comparison document.  I have found that for one user I'm not
 receiving any recommendations, and I'm not sure why.

 Solr: 4.1.0

 *relevant schema*:

 field name=competencyKeywords type=short-mlt-text indexed=true
 stored=true multiValued=true termVectors=true/

 fieldType name=short-mlt-text class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PorterStemFilterFactory/
   /analyzer
 /fieldType

 *user's values*:

 arr name=competencyKeywords
 strHealthcare Cost Trends/str
 /arr

 Is it possible that among all the ~40,000 users in this index (about 500
 of which have the same competency keywords), that the words healthcare,
 cost and trends are just judged by Lucene to not be significant.  I
 realize that I may not understand how the MLT Handler is doing things under
 the covers...I've only been guessing until now based on the (otherwise
 excellent) results I've been seeing.

 Thanks,
 Andy Pickler

 P.S.  For some additional information, the following query:


 /mlt?q=objectId:user91813mlt.fl=competencyKeywordsmlt.interestingTerms=detailsdebugQuery=truemlt.match.include=false

 ...produces the following results...

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime2/int
 /lst
 result name=response numFound=0 start=0/
 lst name=interestingTerms/
 lst name=debug
 str name=rawquerystringobjectId:user91813/str
 str name=querystringobjectId:user91813/str
 str name=parsedquery/
 str name=parsedquery_toString/
 lst name=explain/
 /lst
 /response

Re: Boosting Documents

2013-05-22 Thread Chris Hostetter


: NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for
: any fields where the index-time boost should be stored.
: 
: In my case where I only need to boost the whole document (not a specific
: field), do I have to activate the  omitNorms=false  for all the fields
: in the schema ?

docBoost is really just syntactic sugar for a field boost on each field i 
the document -- it's factored into the norm value for each field in the 
document.  (I'll update the wiki to make this more clear)

If you do a query that doesn't utilize any field which has norms, then the 
docBoost you specified when indexing the document never comes into play.


In general, doc boosts and field boosts, and the way they come into play 
as part of the field norm is fairly inflexible, and (in my opinion) 
antiquated.  A much better way of dealing with this type of problem is 
also discussed in the section of the wiki you linked to.  Imeediately 
below...

http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts

...you'll find...

http://wiki.apache.org/solr/SolrRelevancyFAQ#Field_Based_Boosting


-Hoss

Scheduling DataImports

2013-05-22 Thread smanad

Hi, 

I am new to Solr and recently started exploring it for search/sort needs in
our webapp. 
I have couple of questions as below, (I am using solr 4.2.1 with default
core named collection1)
1. We have a use case where we would like to index data every 10 mins (avg).
Whats the best way to schedule data import every 10 mins or so? cron job?
2. Also, We are indexing data returned from an api which returns different
cache ttls. How can I re-index after ttl its expired? some process which
polls for the expiring soon entries and issues data-import command?

Any pointers will be much appreciated.
Thanks, 
-M



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: solr starting time takes too long

2013-05-22 Thread Zhang, Lisheng

Very sorry about hijacking existing thread (I thought it would be OK
if I just change the title and content, but still wrong).

It will never happen again.

Lisheng

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
Sent: Wednesday, May 22, 2013 11:58 AM
To: solr-user@lucene.apache.org
Subject: Re: solr starting time takes too long



: Subject: solr starting time takes too long
: In-Reply-To: 519c6cd6.90...@smartbit.be
: Thread-Topic: shard splitting

https://people.apache.org/~hossman/#threadhijack


-Hoss

Re: Russian stopwords

2013-05-22 Thread igiguere

I'm encountering the same issue, but, my Russian stopwords.txt IS encoded in
UTF-8.

I verified the encoding using EmEditor (I've used it for years, and I use it
for the existing English, French, Spanish, Portuguese and German Solr
configurations, without issues).
Just to make extra sure, I downloaded Edit Plus, as mentioned in this
thread, and verified the encoding again: UTF-8

I realize this will pass for a stupid question, but... Could there be any
issue other than encoding ?

Thanks;



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Russian-stopwords-tp491490p4065440.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Regular expression in solr

2013-05-22 Thread Furkan KAMACI

API doc says that:

Lucene supports regular expression searches matching a pattern between
forward slashes /. The syntax may change across releases, but the current
supported syntax is documented in the RegExp class. For example to find
documents containing moat or boat:
/[mb]oat/

I think that this may help you:

http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/util/automaton/RegExp.html


2013/5/22 Oussama Jilal jilal.ouss...@gmail.com

 There is no ^ or $ in the solr regex since the regular expression will
 match tokens (not the complete indexed text). So the results you get will
 basicly depend on your way of indexing, if you use the regex on a tokenized
 field and that is not what you want, try to use a copy field wich is not
 tokenized and then use the regex on that one.


 On 05/22/2013 11:53 AM, Stéphane Habett Roux wrote:

 I just can't get the $ endpoint to work.

  I am not sure but I heard it works with the Java Regex engine (a little
 obvious if it is true ...), so any Java regex tutorial would help you.

 On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote:

 Yes, it works for me too. But many times result is not as expected. Is
 there some guide on use of regex in solr?

 -Original Message-
 From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
 Sent: Wednesday, May 22, 2013 4:00 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Regular expression in solr

 I don't think so, it always worked for me without anything special,
 just try it and see :)

 On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote:

 @Oussama Thank you for your reply. Is it as simple as that? I mean no
 additional settings required?

 -Original Message-
 From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
 Sent: Wednesday, May 22, 2013 3:37 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Regular expression in solr

 You can write a regular expression query like this (you need to
 specify the regex between slashes / ) :

 fieldName:/[rR]egular.*/

 On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:

 Hi,

 How do we search based upon regular expressions in solr?

 Regards,
 Sagar



 DISCLAIMER:
 -
 -
 -
 The contents of this e-mail and any attachment(s) are confidential
 and intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in this email are solely
 those of the author and may not necessarily reflect the opinions of
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure,
 modification, distribution and / or publication of this message
 without the prior written consent of the author of this e-mail is
 strictly prohibited. If you have received this email in error please
 delete it and notify the sender immediately. .
 -
 -
 -


 DISCLAIMER:
 --
 -
 The contents of this e-mail and any attachment(s) are confidential and
 intended for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in this email are solely
 those of the author and may not necessarily reflect the opinions of
 NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure,
 modification, distribution and / or publication of this message
 without the prior written consent of the author of this e-mail is
 strictly prohibited. If you have received this email in error please
 delete it and notify the sender immediately. .
 --
 -



 DISCLAIMER:

 ---
 The contents of this e-mail and any attachment(s) are confidential and
 intended
 for the named recipient(s) only.
 It shall not attach any liability on the originator or NEC or its
 affiliates. Any views or opinions presented in
 this email are solely those of the author and may not necessarily
 reflect the
 opinions of NEC or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure,
 modification,
 distribution and / or publication of
 this message without the prior written consent of the author of this
 e-mail is
 strictly prohibited. If you have
 received this email in error please delete it and notify the sender
 immediately. .

 ---

Re: Regular expression in solr

2013-05-22 Thread Lance Norskog

If the indexed data includes positions, it should be possible to 
implement ^ and $ as the first and last positions.


On 05/22/2013 04:08 AM, Oussama Jilal wrote:
There is no ^ or $ in the solr regex since the regular expression will 
match tokens (not the complete indexed text). So the results you get 
will basicly depend on your way of indexing, if you use the regex on a 
tokenized field and that is not what you want, try to use a copy field 
wich is not tokenized and then use the regex on that one.


On 05/22/2013 11:53 AM, Stéphane Habett Roux wrote:

I just can't get the $ endpoint to work.

I am not sure but I heard it works with the Java Regex engine (a 
little obvious if it is true ...), so any Java regex tutorial would 
help you.


On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote:
Yes, it works for me too. But many times result is not as expected. 
Is there some guide on use of regex in solr?


-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
Sent: Wednesday, May 22, 2013 4:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

I don't think so, it always worked for me without anything special, 
just try it and see :)


On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote:
@Oussama Thank you for your reply. Is it as simple as that? I mean 
no additional settings required?


-Original Message-
From: Oussama Jilal [mailto:jilal.ouss...@gmail.com]
Sent: Wednesday, May 22, 2013 3:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Regular expression in solr

You can write a regular expression query like this (you need to 
specify the regex between slashes / ) :


fieldName:/[rR]egular.*/

On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote:

Hi,

How do we search based upon regular expressions in solr?

Regards,
Sagar



DISCLAIMER:
- 


-
-
The contents of this e-mail and any attachment(s) are confidential
and intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of
NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message
without the prior written consent of the author of this e-mail is
strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. .
- 


-
-


DISCLAIMER:
-- 


-
The contents of this e-mail and any attachment(s) are confidential 
and

intended for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in this email are solely
those of the author and may not necessarily reflect the opinions of
NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure,
modification, distribution and / or publication of this message
without the prior written consent of the author of this e-mail is
strictly prohibited. If you have received this email in error please
delete it and notify the sender immediately. .
-- 


-



DISCLAIMER:
--- 


The contents of this e-mail and any attachment(s) are confidential and
intended
for the named recipient(s) only.
It shall not attach any liability on the originator or NEC or its
affiliates. Any views or opinions presented in
this email are solely those of the author and may not necessarily 
reflect the

opinions of NEC or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, 
modification,

distribution and / or publication of
this message without the prior written consent of the author of 
this e-mail is

strictly prohibited. If you have
received this email in error please delete it and notify the sender
immediately. .
---

Tool to read Solr4.2 index

2013-05-22 Thread gpssolr2020

Hi All,

We can use lukeall4.0 for reading Solr3.x index . Like that do we have
anything to read solr 4.x index. Please help.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tool-to-read-Solr4-2-index-tp4065448.html
Sent from the Solr - User mailing list archive at Nabble.com.

AW: Date Field

2013-05-22 Thread Benjamin Kern

How is the format of utc string? Example?
thx

-Ursprüngliche Nachricht-
Von: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Gesendet: Mittwoch, 22. Mai 2013 00:03
An: solr-user@lucene.apache.org
Betreff: Re: Date Field

: 2) Chain TemplateTransformer either by itself or before the
: DateFormatTransformer (not sure if evaluator spits the date out or
: not). Either way, I think you should be able to use the formatDate
: function in the transformer

That sounds correct .. it should be possible to use TemplateTransformer (or
something like RegexTransformer) prior to DateFormatTransformer so that the
value you extract from the xpath (ie: 5/13) gets the literal string  UTC
appended to it, and then configure a dateTimeFormat that parses the timezone
from the value (ie: MM/yy z)


-Hoss

Re: Tool to read Solr4.2 index

2013-05-22 Thread Shreejay

This might help 
http://wiki.apache.org/solr/LukeRequestHandler

-- 
Shreejay Nair
Sent from my mobile device. Please excuse brevity and typos.


On Wednesday, May 22, 2013 at 13:47, gpssolr2020 wrote:

 Hi All,
 
 We can use lukeall4.0 for reading Solr3.x index . Like that do we have
 anything to read solr 4.x index. Please help.
 
 Thanks.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Tool-to-read-Solr4-2-index-tp4065448.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tool to read Solr4.2 index

2013-05-22 Thread gpssolr2020

Thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tool-to-read-Solr4-2-index-tp4065448p4065453.html
Sent from the Solr - User mailing list archive at Nabble.com.

fq facet on double and non-indexed field

2013-05-22 Thread gpssolr2020

Hi

i am trying to apply filtering on non-indexed double field .But its not
returning  any results. So cant we do fq on non-indexed field?

can not use FieldCache on a field which is neither indexed nor has doc
values: EXCH_RT_AMT
/str
int name=code400/int

We are using Solr4.2.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-facet-on-double-and-non-indexed-field-tp4065457.html
Sent from the Solr - User mailing list archive at Nabble.com.

Low Priority: Lucene Facets in Solr?

2013-05-22 Thread Brendan Grainger

Hi All,

Not really a pressing need for this at all, but having worked through a few
tutorials, I was wondering if there was any work being done to incorporate
Lucene Facets into solr:

http://lucene.apache.org/core/4_3_0/facet/org/apache/lucene/facet/doc-files/userguide.html

Brendan

Re: Scheduling DataImports

2013-05-22 Thread Alexandre Rafalovitch

On first, the cron job that hits the DIH trigger URL will probably be
the easiest way.

Not sure I understood the second question. How do you store/know that
the entries expire. And how do you pull for those specific entries?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, May 22, 2013 at 3:36 PM, smanad sma...@gmail.com wrote:
 Hi,

 I am new to Solr and recently started exploring it for search/sort needs in
 our webapp.
 I have couple of questions as below, (I am using solr 4.2.1 with default
 core named collection1)
 1. We have a use case where we would like to index data every 10 mins (avg).
 Whats the best way to schedule data import every 10 mins or so? cron job?
 2. Also, We are indexing data returned from an api which returns different
 cache ttls. How can I re-index after ttl its expired? some process which
 polls for the expiring soon entries and issues data-import command?

 Any pointers will be much appreciated.
 Thanks,
 -M



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Low Priority: Lucene Facets in Solr?

2013-05-22 Thread Jack Krupansky


The topic has come up, but nobody has expressed a sense of urgency.

It actually has a placeholder Jira:
https://issues.apache.org/jira/browse/SOLR-4774

Feel free to add your encouragement there.

-- Jack Krupansky

-Original Message- 
From: Brendan Grainger

Sent: Wednesday, May 22, 2013 6:39 PM
To: solr-user@lucene.apache.org
Subject: Low Priority: Lucene Facets in Solr?

Hi All,

Not really a pressing need for this at all, but having worked through a few
tutorials, I was wondering if there was any work being done to incorporate
Lucene Facets into solr:

http://lucene.apache.org/core/4_3_0/facet/org/apache/lucene/facet/doc-files/userguide.html

Brendan

Using alternate Solr index location for SolrCloud

2013-05-22 Thread Kevin Osborn

Our prod environment is going to be on Azure. As such, I want our index to
live on the Azure VM's local storage rather than the default VM disk (blob
storage).

Normally, I just use /var/opt/tomcat7/PORT/solr/collection1/data, but I
want to use something else.

I am also using the Collections API to create my collections (I have
several). Is my only option to hardcode the data directory in the
collection's solrconfig.xml? I would prefer to avoid this because not all
environments will have this same disk structure.

Ideally, I could put a parameter in the Collections API for the instance
directory. I see this is available for Core Admin, but I don't see it for
the Collections API itself.

Or failing that, solr.xml would be better.

Does anyone have any suggestions? Thanks.

-- 
*KEVIN OSBORN*
LEAD SOFTWARE ENGINEER
CNET Content Solutions
OFFICE 949.399.8714
CELL 949.310.4677  SKYPE osbornk
5 Park Plaza, Suite 600, Irvine, CA 92614
[image: CNET Content Solutions]

Storing and retrieving json

2013-05-22 Thread Karthick Duraisamy Soundararaj

Hello all,
I am facing a need to store and retrieve json string in a
field.

eg.  Imagine a schema like below.
[Please note that this is just an example but not actual specification.]

 str name=carName type=string indexed=true stored=false
 str name=carDescription type=string indexed=false stored=false

carDescription is a json string . An example would be
   { model:1988 type:manual}

I dont need to search on the carDescription. I want to store some json data
and retreive. When i feed json data to carDescription field through DIH,
 the response for the query is like below

   {\ model\:1988 \type\:\manual\}

All the quotes are escaped. I dont want this. I want the original
unmodified data. Is there a way to do this?

Thanks,
Karthick

Re: Storing and retrieving json

2013-05-22 Thread Jack Krupansky

Yes, the quotes need to be escaped - since they are contained within a 
quoted string, which you didn't show. That is the proper convention for 
representing strings in JSON. Are you familiar with the JSON format? If not, 
try XML - it won't have to represent a string as a quoted JSON string.


If you read and parse the Solr response with a JSON parser, you should get 
your original JSON string value back intact. Now, you may want to do a JSON 
parse of that string itself, but that has nothing to do with the Solr JSON 
response itself. As you said, you wanted to store and retrieve JSON as a 
string field, which Solr appears to be doing correctly.


-- Jack Krupansky

-Original Message- 
From: Karthick Duraisamy Soundararaj

Sent: Wednesday, May 22, 2013 8:03 PM
To: solr-user@lucene.apache.org
Subject: Storing and retrieving json

Hello all,
   I am facing a need to store and retrieve json string in a
field.

eg.  Imagine a schema like below.
[Please note that this is just an example but not actual specification.]

str name=carName type=string indexed=true stored=false
str name=carDescription type=string indexed=false stored=false

carDescription is a json string . An example would be
  { model:1988 type:manual}

I dont need to search on the carDescription. I want to store some json data
and retreive. When i feed json data to carDescription field through DIH,
the response for the query is like below

  {\ model\:1988 \type\:\manual\}

All the quotes are escaped. I dont want this. I want the original
unmodified data. Is there a way to do this?

Thanks,
Karthick

Re: List of Solr Query Parsers

2013-05-22 Thread Roman Chyla

Hello,
I have just created a new JIRA issue, if you are interested in trying out
the new query parser, please visit:
https://issues.apache.org/jira/browse/LUCENE-5014
Thanks,

roman

On Mon, May 6, 2013 at 5:36 PM, Jan Høydahl jan@cominvent.com wrote:

 Added. Please try editing the page now.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 6. mai 2013 kl. 19:58 skrev Roman Chyla roman.ch...@gmail.com:

  Hi Jan,
  My login is RomanChyla
  Thanks,
 
  Roman
  On 6 May 2013 10:00, Jan Høydahl jan@cominvent.com wrote:
 
  Hi Roman,
 
  This sounds great! Please register as a user on the WIKI and give us
 your
  username here, then we'll grant you editing karma so you can edit the
 page
  yourself! The NEAR/5 syntax is really something I think we should get
 into
  the default lucene parser. Can't wait to have a look at your code.
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com
 
  6. mai 2013 kl. 15:41 skrev Roman Chyla roman.ch...@gmail.com:
 
  Hi Jan,
  Please add this one
  http://29min.wordpress.com/category/antlrqueryparser/
  - I can't edit the wiki
 
  This parser is written with ANTLR and on top of lucene modern query
  parser.
  There is a version which implements Lucene standard QP as well as a
  version
  which includes proximity operators, multi token synonym handling and
 all
  of
  solr qparsers using function syntax - ie,. for a query like: multi
  synonym
  NEAR/5 edismax(foo)
 
  I would like to create a JIRA ticket soon
 
  Thanks
 
  Roman
  On 6 May 2013 09:21, Jan Høydahl jan@cominvent.com wrote:
 
  Hi,
 
  I just added a Wiki page to try to gather a list of all known Solr
 query
  parsers in one place, both those which are part of Solr and those in
  JIRA
  or 3rd party.
 
  http://wiki.apache.org/solr/QueryParser
 
  If you known about other cool parsers out there, please add to the
 list.
 
  --
  Jan Høydahl, search solution architect
  Cominvent AS - www.cominvent.com

Question about Coordination factor

2013-05-22 Thread Kazuaki Hiraga

Hello Folks,
I have a question about coordination factor to ensure my understanding of this 
value is correct.
If I have documents that contain some keywords like the following:Doc1: A, B, 
CDoc2: A, CDoc3: B, C
And my query is A OR B OR C OR D. 
In this case, Coord factor value for each documents will be the following:  
Doc1: 3/4  Doc2: 2/4  Doc3: 2/4
In the same fashion, respective value of coord factor is the following if I 
have a query C OR D:  Doc1: 1/2  Doc2: 1/2  Doc3: 1/2
Is this correct? or Did I miss something?
Please correct me if I am wrong.
Regards,Kazuaki Hiraga

1 2 >

1 - 100 of 117 matches

Mail list logo