Re: Apache solr sink issue

2014-08-19 Thread Gopal Patwa
Do you have this tag uniqueKeyid/uniqueKey define in your schema , it
is not mandatory to have unique field but if you need it then u have to
provide it else you can remove it, see below wiki page for more details

http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field

Some options to generate this field if your document cannot derive one

https://wiki.apache.org/solr/UniqueKey




On Mon, Aug 18, 2014 at 10:48 PM, Jeniba Johnson 
jeniba.john...@lntinfotech.com wrote:

 Hi,

 I want to index a log file in Solr using Flume + Apache Solr sink
 Iam referring this below mentioned URL

 https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume


 Error  from flume console
 2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR -
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)]
 error
 java.lang.Exception: Bad Request
 request: http://xxx.xx.xx:8983/solr/update?wt=javabinversion=2
 at
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)


 Error  from solr console
 473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore  â
 org.apache.solr.common.SolrException: Document is missing mandatory
 uniqueKey field: id


 Csn anyone help me with this issue and help me with the steps for
 integrating flume with solr sink



 Regards,
 Jeniba Johnson


 
 The contents of this e-mail and any attachment(s) may contain confidential
 or privileged information for the intended recipient(s). Unintended
 recipients are prohibited from taking action on the basis of information in
 this e-mail and using or disseminating the information, and must notify the
 sender and delete it from their system. LT Infotech will not accept
 responsibility or liability for the accuracy or completeness of, or the
 presence of any virus or disabling code in this e-mail



Any recommendation for Solr Cloud version.

2014-08-19 Thread Lee Chunki
Hi,

I am trying to build a new Solr Cloud which will replace sold cluster ( 2 
indexers + 2 searchers ).
the version what I am using is 4.1.

the sooner the better? i.e. version 4.9.0.

Please give any suggestion for me.

Thanks,
Chunki.




Exact match?

2014-08-19 Thread William Bell
If I have a long string, how do I match on 90% of the terms to see if there
is a duplicate?

If I add the field and index it, what is the best way to return 90%?

# terms match
# of terms in the field?


-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Syntax unavailable for parameter substitution Solr 3.5

2014-08-19 Thread deepaksshettigar
Thanks Chris, Yes I am comfortable writing Java code  will try to give it a
shot.

Thanks
Deepak 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Syntax-unavailable-for-parameter-substitution-Solr-3-5-tp4153197p4153722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr, weblogic managed server and log4j logging

2014-08-19 Thread Croci Francesco Luigi (ID SWS)
Maybe some of you uses Solr with Weblogic and can help me...

I have weblogic 12.1.3 and would like to deploy/run solr on a managed server.

I started the node manager, created a server named server-solr and deployed 
solr(4.7.9).
In the server start tab of the server configuration I added 
C:\lib\wllog4j.jar;C:\lib\log4j-1.2.16.jar in the Class Path and 
-Dlog4j.configuration=C:\download\log4j.properties 
-Dweblogic.log.Log4jLoggingEnabled=true in the Arguments

When I try to start the server I get the following error:

Aug 19, 2014 10:53:07 AM CEST Critical WebLogicServer BEA-000386 
Server subsystem failed. Reason: A MultiException has 6 exceptions.  They are:
1. java.lang.NoSuchMethodError: 
com.bea.logging.LogBufferHandler.getServerLogBufferHandler()Lcom/bea/logging/LogBufferHandler;
2. java.lang.IllegalStateException: Unable to perform operation: post construct 
on weblogic.diagnostics.lifecycle.LoggingServerService
3. java.lang.IllegalArgumentException: While attempting to resolve the 
dependencies of weblogic.diagnostics.lifecycle.DiagnosticFoundationService 
errors were found
4. java.lang.IllegalStateException: Unable to perform operation: resolve on 
weblogic.diagnostics.lifecycle.DiagnosticFoundationService
5. java.lang.IllegalArgumentException: While attempting to resolve the 
dependencies of com.oracle.injection.integration.CDIIntegrationService errors 
were found
6. java.lang.IllegalStateException: Unable to perform operation: resolve on 
com.oracle.injection.integration.CDIIntegrationService

Do I still miss something?

Thank you in advance

Francesco




Re: solr cloud going down repeatedly

2014-08-19 Thread Jakov Sosic

On 08/18/2014 08:38 PM, Shawn Heisey wrote:


With an 8GB heap and UseConcMarkSweepGC as your only GC tuning, I can
pretty much guarantee that you'll see occasional GC pauses of 10-15
seconds, because I saw exactly that happening with my own setup.

This is what I use now:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

I can't claim that my problem is 100% solved, but collections that go
over one second are *very* rare now, and I'm pretty sure they are all
under two seconds.


Thank you for your comment.

How did you test these settings? I mean, that's a lot of tuning and I 
would like to set up some test environment to be certain this is what I 
want...




sample Cell schema question

2014-08-19 Thread jmlucjav
In the sample schema.xml I can see this:

!-- Main body of document extracted by SolrCell.
NOTE: This field is not indexed by default, since it is also
copied to text
using copyField below. This is to save space. Use this field
for returning and
highlighting document content. Use the text field to search
the content. --
field name=content type=text_general indexed=false stored=true
multiValued=true/


I am wondering, how does having this split in two fields text/content save
space?


Re: sample Cell schema question

2014-08-19 Thread jmlucjav
ok, I had not noticed text contains also the other metadata like keywords,
description etc, nevermind!


On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote:

 In the sample schema.xml I can see this:

 !-- Main body of document extracted by SolrCell.
 NOTE: This field is not indexed by default, since it is also
 copied to text
 using copyField below. This is to save space. Use this field
 for returning and
 highlighting document content. Use the text field to search
 the content. --
 field name=content type=text_general indexed=false
 stored=true multiValued=true/


 I am wondering, how does having this split in two fields text/content save
 space?



Re: BlendedInfixSuggester index write.lock failures on core reload

2014-08-19 Thread Varun Thacker
Hi,

Yes this indeed is a bug. I am currently trying to get a patch for it.

This is the Jira issue - https://issues.apache.org/jira/browse/SOLR-6246


On Thu, Aug 14, 2014 at 7:52 PM, Zisis Tachtsidis zist...@runbox.com
wrote:

 Hi all,

 I'm using Solr 4.9.0 and have setup a spellcheck component for returning
 suggestions. The  configuration inside my solr.SpellCheckComponent has as
 follows.

 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str

 name=lookupImplorg.apache.solr.spelling.suggest.fst.BlendedInfixLookupFactory/str
 along with a custom value for
 str name=indexPath/str

 The server is starting properly and data gets indexed but once i hit the
 'Reload' button from 'Core Admin' I get the following error.

 null:org.apache.solr.common.SolrException: Error handling 'reload' action
 at

 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:791)
 at

 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:224)
 at

 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:187)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at

 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 at

 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
 at

 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:156)
 at

 com.caucho.server.webapp.AccessLogFilterChain.doFilter(AccessLogFilterChain.java:95)
 at

 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:289)
 at
 com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:838)
 at

 com.caucho.network.listen.TcpSocketLink.dispatchRequest(TcpSocketLink.java:1345)
 at

 com.caucho.network.listen.TcpSocketLink.handleRequest(TcpSocketLink.java:1301)
 at

 com.caucho.network.listen.TcpSocketLink.handleRequestsImpl(TcpSocketLink.java:1285)
 at

 com.caucho.network.listen.TcpSocketLink.handleRequests(TcpSocketLink.java:1193)
 at

 com.caucho.network.listen.TcpSocketLink.handleAcceptTaskImpl(TcpSocketLink.java:992)
 at
 com.caucho.network.listen.ConnectionTask.runThread(ConnectionTask.java:117)
 at
 com.caucho.network.listen.ConnectionTask.run(ConnectionTask.java:93)
 at

 com.caucho.network.listen.SocketLinkThreadLauncher.handleTasks(SocketLinkThreadLauncher.java:169)
 at

 com.caucho.network.listen.TcpSocketAcceptThread.run(TcpSocketAcceptThread.java:61)
 at
 com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
 at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118)
 Caused by: org.apache.solr.common.SolrException: Unable to reload core:
 autocomplete
 at
 org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:911)
 at
 org.apache.solr.core.CoreContainer.reload(CoreContainer.java:660)
 at

 org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:789)
 ... 24 more
 Caused by: org.apache.solr.common.SolrException
 at org.apache.solr.core.SolrCore.init(SolrCore.java:868)
 at org.apache.solr.core.SolrCore.reload(SolrCore.java:426)
 at
 org.apache.solr.core.CoreContainer.reload(CoreContainer.java:650)
 ... 25 more
 Caused by: java.lang.RuntimeException
 at

 org.apache.solr.spelling.suggest.fst.BlendedInfixLookupFactory.create(BlendedInfixLookupFactory.java:102)
 at
 org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:105)
 at

 org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:636)
 at
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:651)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:851)
 ... 27 more

 Debugging Solr code I found out that the original exception comes from the
 IndexWriter construction inside AnalyzingInfixSuggester.java ( more
 specifically org.apache.lucene.store.Lock:89). The exception is Lock
 obtain
 timed out: NativeFSLock@$indexPath/write.lock but seems to be hidden by
 the
 RuntimeException thrown by BlendedInfixLookupFactory.

 If I use the default indexPath I get another error (write lock related
 again) in the logs.
 org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
 NativeFSLock@$indexPath/blendedInfixSuggesterIndexDir/write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:89)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:724)
 at

 

Re: sample Cell schema question

2014-08-19 Thread Aman Tandon
I have a question, does storing the data in copyfields save space?

With Regards
Aman Tandon


On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote:

 ok, I had not noticed text contains also the other metadata like keywords,
 description etc, nevermind!


 On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote:

  In the sample schema.xml I can see this:
 
  !-- Main body of document extracted by SolrCell.
  NOTE: This field is not indexed by default, since it is also
  copied to text
  using copyField below. This is to save space. Use this field
  for returning and
  highlighting document content. Use the text field to search
  the content. --
  field name=content type=text_general indexed=false
  stored=true multiValued=true/
 
 
  I am wondering, how does having this split in two fields text/content
 save
  space?
 



Re: sample Cell schema question

2014-08-19 Thread jmlucjav
no it does not.

Here the intent, I think, is not to duplicate stored info, as other
metadata fields like author, keywords etc already are stored, if 'text' was
stored (text is where all fields: content, author etc are copyed), then it
would contain some duplicate info.


On Tue, Aug 19, 2014 at 1:05 PM, Aman Tandon amantandon...@gmail.com
wrote:

 I have a question, does storing the data in copyfields save space?

 With Regards
 Aman Tandon


 On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote:

  ok, I had not noticed text contains also the other metadata like
 keywords,
  description etc, nevermind!
 
 
  On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote:
 
   In the sample schema.xml I can see this:
  
   !-- Main body of document extracted by SolrCell.
   NOTE: This field is not indexed by default, since it is
 also
   copied to text
   using copyField below. This is to save space. Use this
 field
   for returning and
   highlighting document content. Use the text field to
 search
   the content. --
   field name=content type=text_general indexed=false
   stored=true multiValued=true/
  
  
   I am wondering, how does having this split in two fields text/content
  save
   space?
  
 



Re: sample Cell schema question

2014-08-19 Thread Aurélien MAZOYER
indexed means you can search it, stored means you can return the 
value to the user or highlight it.

Both consum disk space.
A copyfield is not a kind of special field : it is a directive that 
copies one field values to another field. They are many use cases for 
using copy fields.
In the example, we use a specific field, text, as a default field where 
use will perform the searches. That is why we copy all fields that we 
want to search in that specific field text(note that there are other 
way to search multiple fields : have a look to 
http://wiki.apache.org/solr/ExtendedDisMax)
For exemple, the field contentis copied to the text field  (that is 
indexed) for searching. As we will use the field text to perform our 
search, we don't need to index the content field too, and we don't, 
you save some disk space.


Regards,

Aurélien



Le 19/08/2014 13:05, Aman Tandon a écrit :

I have a question, does storing the data in copyfields save space?

With Regards
Aman Tandon


On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote:


ok, I had not noticed text contains also the other metadata like keywords,
description etc, nevermind!


On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote:


In the sample schema.xml I can see this:

 !-- Main body of document extracted by SolrCell.
 NOTE: This field is not indexed by default, since it is also
copied to text
 using copyField below. This is to save space. Use this field
for returning and
 highlighting document content. Use the text field to search
the content. --
 field name=content type=text_general indexed=false
stored=true multiValued=true/


I am wondering, how does having this split in two fields text/content

save

space?





Re: Exact match?

2014-08-19 Thread Erik Hatcher
Maybe use dismax for this?  Something like q={!dismax qf=field_name 
mm=90%}query_string, or more verbosely and separately, 
q=query_stringdefType=dismaxmm=90%

Erik



On Aug 19, 2014, at 2:43 AM, William Bell billnb...@gmail.com wrote:

 If I have a long string, how do I match on 90% of the terms to see if there
 is a duplicate?
 
 If I add the field and index it, what is the best way to return 90%?
 
 # terms match
 # of terms in the field?
 
 
 -- 
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Re: faceted query with stats not working in solrj

2014-08-19 Thread tedsolr
That's a good suggestion, I hadn't checked that log file. What I found that
works, is hitting these methods on the SolrQuery object:

query.setGetFieldStatistics(true);
query.setParam(stats.field, MyStatsFieldName);
query.setParam(stats.facet, MyFacetFieldName);

Now I see the stats in the QueryResponse.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/faceted-query-with-stats-not-working-in-solrj-tp4153608p4153764.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Any recommendation for Solr Cloud version.

2014-08-19 Thread Mark Miller


On August 19, 2014 at 2:39:32 AM, Lee Chunki (lck7...@coupang.com) wrote:
  the sooner the better? i.e. version 4.9.0.

Yes, certainly. 

-- 
Mark Miller
about.me/markrmiller


Re: Apache Solr Wiki

2014-08-19 Thread Julie . Voss
Can I also have access to the wiki? We are at the outset of a Solr/Hybris 
implementation.



From:   Mark Sun mark...@motionelements.com
To: solr-user@lucene.apache.org
Date:   08/18/2014 08:06 PM
Subject:Apache Solr Wiki



Dear Solr Wiki admin,

We are using Solr for our multilingual asian language keywords search, as
well as visual similarity search engine (via pixolution plugin). We would
like to update the Powered by Solr section. As well as help to add on to
the knowledge base for other Solr setups.

Can you add me, username MarkSun as a contributor to the wiki?

Thank you!

Cheers,
Mark Sun
CTO

MotionElements Pte Ltd
190 Middle Road, #10-05 Fortune Centre
Singapore 188979
mark...@motionelements.com

www.motionelements.com
=
Asia-inspired Stock Animation | Video Footage l AE Template online
marketplace
=
This message may contain confidential and/or privileged information.  If
you are not the addressee or authorized to receive this for the addressee,
you must not use, copy, disclose or take any action based on this message
or any information herein. If you have received this message in error,
please advise the sender immediately by reply e-mail and delete this
message.  Thank you for your cooperation.



Re: solr cloud going down repeatedly

2014-08-19 Thread Shawn Heisey
On 8/19/2014 3:12 AM, Jakov Sosic wrote:
 Thank you for your comment.

 How did you test these settings? I mean, that's a lot of tuning and I
 would like to set up some test environment to be certain this is what
 I want...

I included a section on tools when I wrote this page:

http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems

Thanks,
Shawn



Near Realtime get

2014-08-19 Thread Philippe Soares
Hi,
I tried the realtime get today in a solrcloud setup, and it's returning
only a subset of my stored fields. Did I miss any parameter that would
return all the fields ?

Thanks for your help !

Philippe

This email message and any attachments are confidential and may be privileged.  
If you are not the intended recipient, please notify GenomeQuest immediately by 
replying to this message or by sending an email to postmas...@genomequest.com 
and destroy all copies of this message and any attachments without reading or 
disclosing their contents. Thank you.


Indexing and Querying MS SQL Server 2012 Spatial

2014-08-19 Thread Bostic, Alex
Hello I'm new to Solr:
I have a SQL Server 2012 database with spatial columns (points/lines/polys)
Do you have any resources to point to for the following
Creating a Solr index of a sql server spatial table
Bounding Box query (intersect) example, possibly with front-end from GMaps or 
OpenLayers
 I'm currently reading Apache Solr Beginner's Guide and have reviewed 
https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
I am able to index and query my non spatial data, I am just looking for some 
resource that may have some more detail about how to set everything up.
I can provide more detail if needed.
Thanks

Alex Bostic
GIS Developer
URS Corporation
12420 Milestone Center Drive, Suite 150
Germantown, MD 20876
direct line: 301-820-3287
cell line: 301-213-2639



This e-mail and any attachments contain URS Corporation confidential 
information that may be proprietary or privileged. If you receive this message 
in error or are not the intended recipient, you should not retain, distribute, 
disclose or use any of this information and you should destroy the e-mail and 
any attachments or copies.


Re: Apache solr sink issue

2014-08-19 Thread Erick Erickson
While Gopal is correct, having a uniqueKey is not mandatory,
if you're using SolrCloud, it _is_ necessary. And I don't know the
internals of the Flume Solr Sink, but if it uses CloudSolrServer
under the covers I'd be surprised if it worked without a uniqueKey
defined. And I'd guess it does use CloudSolrServer.

The error is explicit. You're sending documents to Solr that
don't have an id field (or whatever your uniqueKey is set
to in schema.xml).

Best,
Erick

On Mon, Aug 18, 2014 at 11:08 PM, Gopal Patwa gopalpa...@gmail.com wrote:
 Do you have this tag uniqueKeyid/uniqueKey define in your schema , it
 is not mandatory to have unique field but if you need it then u have to
 provide it else you can remove it, see below wiki page for more details

 http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field

 Some options to generate this field if your document cannot derive one

 https://wiki.apache.org/solr/UniqueKey




 On Mon, Aug 18, 2014 at 10:48 PM, Jeniba Johnson 
 jeniba.john...@lntinfotech.com wrote:

 Hi,

 I want to index a log file in Solr using Flume + Apache Solr sink
 Iam referring this below mentioned URL

 https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume


 Error  from flume console
 2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR -
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)]
 error
 java.lang.Exception: Bad Request
 request: http://xxx.xx.xx:8983/solr/update?wt=javabinversion=2
 at
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)


 Error  from solr console
 473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore  â
 org.apache.solr.common.SolrException: Document is missing mandatory
 uniqueKey field: id


 Csn anyone help me with this issue and help me with the steps for
 integrating flume with solr sink



 Regards,
 Jeniba Johnson


 
 The contents of this e-mail and any attachment(s) may contain confidential
 or privileged information for the intended recipient(s). Unintended
 recipients are prohibited from taking action on the basis of information in
 this e-mail and using or disseminating the information, and must notify the
 sender and delete it from their system. LT Infotech will not accept
 responsibility or liability for the accuracy or completeness of, or the
 presence of any virus or disabling code in this e-mail



Re: Apache Solr Wiki

2014-08-19 Thread Erick Erickson
Julie:

bq: Can I also have access to the wiki?

Sure. Sou need to create a Wiki logon and let us
know what that is before we can add you to the list.

Best,
Erick

On Tue, Aug 19, 2014 at 6:54 AM,  julie.v...@anixter.com wrote:
 Can I also have access to the wiki? We are at the outset of a Solr/Hybris
 implementation.



 From:   Mark Sun mark...@motionelements.com
 To: solr-user@lucene.apache.org
 Date:   08/18/2014 08:06 PM
 Subject:Apache Solr Wiki



 Dear Solr Wiki admin,

 We are using Solr for our multilingual asian language keywords search, as
 well as visual similarity search engine (via pixolution plugin). We would
 like to update the Powered by Solr section. As well as help to add on to
 the knowledge base for other Solr setups.

 Can you add me, username MarkSun as a contributor to the wiki?

 Thank you!

 Cheers,
 Mark Sun
 CTO

 MotionElements Pte Ltd
 190 Middle Road, #10-05 Fortune Centre
 Singapore 188979
 mark...@motionelements.com

 www.motionelements.com
 =
 Asia-inspired Stock Animation | Video Footage l AE Template online
 marketplace
 =
 This message may contain confidential and/or privileged information.  If
 you are not the addressee or authorized to receive this for the addressee,
 you must not use, copy, disclose or take any action based on this message
 or any information herein. If you have received this message in error,
 please advise the sender immediately by reply e-mail and delete this
 message.  Thank you for your cooperation.



Substring and Case In sensitive Search

2014-08-19 Thread Nishanth S
Hi,

I am  very new to solr.How can I allow solr search on a string field case
insensitive and substring?.

Thanks,
Nishanth


Re: Substring and Case In sensitive Search

2014-08-19 Thread Jack Krupansky
Substring search a string field using wildcard, *, at beginning and end of 
query term.


Case-insensitive match on string field is not supported.

Instead, copy the string field to a text field, use the keyword tokenizer, 
and then apply the lower case filter.


But... review your use case to confirm whether you really need to use 
string as opposed to text field.


-- Jack Krupansky

-Original Message- 
From: Nishanth S

Sent: Tuesday, August 19, 2014 12:03 PM
To: solr-user@lucene.apache.org
Subject: Substring and Case In sensitive Search

Hi,

I am  very new to solr.How can I allow solr search on a string field case
insensitive and substring?.

Thanks,
Nishanth 



Re: Apache Solr Wiki

2014-08-19 Thread Julie . Voss
user name: julievoss



From:   Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Date:   08/19/2014 10:34 AM
Subject:Re: Apache Solr Wiki



Julie:

bq: Can I also have access to the wiki?

Sure. Sou need to create a Wiki logon and let us
know what that is before we can add you to the list.

Best,
Erick

On Tue, Aug 19, 2014 at 6:54 AM,  julie.v...@anixter.com wrote:
 Can I also have access to the wiki? We are at the outset of a 
Solr/Hybris
 implementation.



 From:   Mark Sun mark...@motionelements.com
 To: solr-user@lucene.apache.org
 Date:   08/18/2014 08:06 PM
 Subject:Apache Solr Wiki



 Dear Solr Wiki admin,

 We are using Solr for our multilingual asian language keywords search, 
as
 well as visual similarity search engine (via pixolution plugin). We 
would
 like to update the Powered by Solr section. As well as help to add on to
 the knowledge base for other Solr setups.

 Can you add me, username MarkSun as a contributor to the wiki?

 Thank you!

 Cheers,
 Mark Sun
 CTO

 MotionElements Pte Ltd
 190 Middle Road, #10-05 Fortune Centre
 Singapore 188979
 mark...@motionelements.com

 www.motionelements.com
 =
 Asia-inspired Stock Animation | Video Footage l AE Template online
 marketplace
 =
 This message may contain confidential and/or privileged information.  If
 you are not the addressee or authorized to receive this for the 
addressee,
 you must not use, copy, disclose or take any action based on this 
message
 or any information herein. If you have received this message in error,
 please advise the sender immediately by reply e-mail and delete this
 message.  Thank you for your cooperation.




Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Timothy Potter
Hi,

Using the coreAdmin mergeindexes command to merge an index into a
leader (SolrCloud mode on 4.9.0) and the replica does not do a snap
pull from the leader as I would have expected. The merge into the
leader worked like a charm except I had to send a hard commit after
that (which makes sense).

I'm guessing the replica would snap pull from the leader if I
restarted it, but reloading the collection or core does not trigger
the replica to pull from the leader. This seems like an oversight in
the mergeindex interaction with SolrCloud. Seems like the simplest
would be for the leader to send all replicas a request recovery
command after performing the merge.

Advice?

Cheers,
Tim


Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Mark Miller
I’d just file a JIRA. Merge, like optimize and a few other things, were never 
tested or considered in early SolrCloud days. It’s used in the HDFS stuff, but 
in that case, the index is merged to all replicas and no recovery is necessary.

If you want to make the local filesystem merge work well with SolrCloud, sounds 
like we should write a test and make it work.

--  
Mark Miller
about.me/markrmiller

On August 19, 2014 at 1:20:54 PM, Timothy Potter (thelabd...@gmail.com) wrote:
 Hi,
  
 Using the coreAdmin mergeindexes command to merge an index into a
 leader (SolrCloud mode on 4.9.0) and the replica does not do a snap
 pull from the leader as I would have expected. The merge into the
 leader worked like a charm except I had to send a hard commit after
 that (which makes sense).
  
 I'm guessing the replica would snap pull from the leader if I
 restarted it, but reloading the collection or core does not trigger
 the replica to pull from the leader. This seems like an oversight in
 the mergeindex interaction with SolrCloud. Seems like the simplest
 would be for the leader to send all replicas a request recovery
 command after performing the merge.
  
 Advice?
  
 Cheers,
 Tim
  



Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Mark Miller

On August 19, 2014 at 1:33:10 PM, Mark Miller (markrmil...@gmail.com) wrote:
  sounds like we should write a test and make it work.

Keeping in mind that when using a shared filesystem like HDFS or especially if 
using the MapReduce contrib, you probably won’t want this new behavior.

-- 
Mark Miller
about.me/markrmiller


Help with StopFilterFactory

2014-08-19 Thread heaven
Hi, I have the next text field:

fieldType name=words_ngram class=solr.TextField omitNorms=false
  analyzer
tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
filter class=solr.StopFilterFactory words=url_stopwords.txt
ignoreCase=true /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

url_stopwords.txt looks like:
http
https
ftp
www

So very simple. In index I have:
* twitter.com/testuser

All these queries do match:
* twitter.com/testuser
* com/testuser
* testuser

But any of these does:
* https://twitter.com/testuser
* https://www.twitter.com/testuser
* www.twitter.com/testuser

What do I do wrong? Analysis makes me think something is wrong with token
positions:
http://lucene.472066.n3.nabble.com/file/n4153839/oi7o69.jpg 
but I was thinking StopFilterFactory is supposed to remove
https/http/ftw/www keywords. Why do they figure there at all? That doesn't
make much sense.

Regards,
Alexander



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Performance of Boolean query with hundreds of OR clauses.

2014-08-19 Thread SolrUser1543
I am using Solr to perform search for finding similar pictures. 

For this purpose, every image indexed as a set of descriptors ( descriptor
is a string of 6 chars ) .
Number of descriptors for every image may vary ( from few to many thousands)

When I want to search  for a similar image , I am extracting the descriptors
from it and create a query like :
MyImage:( desc1 desc2 ...  desc n )

Number of descriptors in query may also vary. Usual it is about 1000.

Of course performance of this query very bad and may take few minutes to
return . 

Any ideas for performance improvement ? 

P.s I also tried to use lire , but it is not fits my use case.  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-of-Boolean-query-with-hundreds-of-OR-clauses-tp4153844.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication of full index to replica after merge index into leader not working

2014-08-19 Thread Timothy Potter
Was able to get around it for now sending the REQUESTRECOVERY command
to the replica. Will open an improvement JIRA but not sure if it's
worth it as the work-around is pretty clean (IMO).

Tim

On Tue, Aug 19, 2014 at 5:33 PM, Mark Miller markrmil...@gmail.com wrote:
 I’d just file a JIRA. Merge, like optimize and a few other things, were never 
 tested or considered in early SolrCloud days. It’s used in the HDFS stuff, 
 but in that case, the index is merged to all replicas and no recovery is 
 necessary.

 If you want to make the local filesystem merge work well with SolrCloud, 
 sounds like we should write a test and make it work.

 --
 Mark Miller
 about.me/markrmiller

 On August 19, 2014 at 1:20:54 PM, Timothy Potter (thelabd...@gmail.com) wrote:
 Hi,

 Using the coreAdmin mergeindexes command to merge an index into a
 leader (SolrCloud mode on 4.9.0) and the replica does not do a snap
 pull from the leader as I would have expected. The merge into the
 leader worked like a charm except I had to send a hard commit after
 that (which makes sense).

 I'm guessing the replica would snap pull from the leader if I
 restarted it, but reloading the collection or core does not trigger
 the replica to pull from the leader. This seems like an oversight in
 the mergeindex interaction with SolrCloud. Seems like the simplest
 would be for the leader to send all replicas a request recovery
 command after performing the merge.

 Advice?

 Cheers,
 Tim




Re: Replicating Between Solr Clouds

2014-08-19 Thread reparker23
Are there any more OOB solutions for inter-SolrCloud replication now?  Our
indexing is so slow that we cannot rely on a complete re-index of data from
our DB of record (SQL) to recover data in the Solr indices.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196p4153856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Index not respecting Omit Norms

2014-08-19 Thread Tim.Cardwell
Please reference the below images:

http://lucene.472066.n3.nabble.com/file/n4153863/Schema.png 

http://lucene.472066.n3.nabble.com/file/n4153863/SolrDescriptionSchemaBrowser.png
 

http://lucene.472066.n3.nabble.com/file/n4153863/SolrDescriptionDebugResults.png
 

As you can see from the first image, the text field-type doesn't define the
omitNorms flag, meaning it is set to false. Also on the first image you can
see that the description field doesn't define the omitNorms flag, again
meaning it is set to false. (Default for omitNorms is false). This can all
be confirmed on the second image, where the Properties and Schema rows have
omitNorms set to checked.
I am having some issues understanding why some results have a fieldNorm set
to 1 for matches on the description field. As you can see from the third
image, the description field has a rather large number of terms in it, yet
the fieldNorm is being set to 1.0 for matching 'supply' on the description
field. My guess is that the Omit Norms flag for the 'Index' row is causing
the issue.
Questions:
  
From the first picture, can anyone tell me what each row (Properties, Schema
and Index) refers to? I think the Properties row refers to the flags set
when defining the Field Type, which for this field is text. The Schema row
refers to the flags set when defining the field, which is description. I'm
not as sure where the Index row flags come from, but I'm assuming it defines
what the index is really representing?  
Am I right in assuming the Omit Norms flag in the Index row of the first
picture is what is causing fieldNorm issues in the second image?  
If I am correct in the above question, how do I fix it?
Additional information:
  
I am not using the standard request handler. I am using a custom request
handler that uses eDisMax.  
The description_sortAlpha field that the description field is copying to is
a text field *but* it has omitNorms set to true  
My Index Analyzers for the description field are:
WhitespaceTokenizerFactory, StopFilterFactory, WordDelimiterFilterFactory,
LowerCaseFilterFactory and RemoveDuplicatesTokenFIlterFactory, in that order  
My Query Analyzers for the description field are:
WhitespaceTokenizerFactory, SynonymFilterFactory, StopFilterFactory,
WordDelimiterFilterFactory, LowerCaseFilterFactory and
RemoveDuplicatesTokenFilterFactory, in that order.  
The description field is not the only text field to be having this omit
norms issue for the Index row. There are actually a couple of others.
Thanks,
-Tim




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-not-respecting-Omit-Norms-tp4153863.html
Sent from the Solr - User mailing list archive at Nabble.com.

Question on Solr Relevancy using Okapi BM25F

2014-08-19 Thread rks_lucene
I am trying to get OkapiBM25F working over some press release articles I am
indexing. The data has text portions spread across 3 fields - Title, Summary
and Full Article. 

 I would like to influence the standard BM25 by giving more weight to words
in title and summary of the article than the full description. The
importance has to be of the order title  Summary  Full description.

I am unable to find schema examples online that can help me with it. 

Can someone guide me with a possible schema for this. (or a link to an
article/blog that explains it)

Thanks for your help.

-Ritesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-on-Solr-Relevancy-using-Okapi-BM25F-tp4153866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Performance of Boolean query with hundreds of OR clauses.

2014-08-19 Thread Jack Krupansky
A large number of query terms is definitely an anti-pattern and not a 
recommended use case for Solr, but I'm a little surprised that it takes 
minutes, as opposed to 10 to 20 seconds.


Does your index fit entirely in the OS system memory available for file 
caching?


IOW, are those few minutes CPU-bound or I/O-bound?

-- Jack Krupansky

-Original Message- 
From: SolrUser1543

Sent: Tuesday, August 19, 2014 2:57 PM
To: solr-user@lucene.apache.org
Subject: Performance of Boolean query with hundreds of OR clauses.

I am using Solr to perform search for finding similar pictures.

For this purpose, every image indexed as a set of descriptors ( descriptor
is a string of 6 chars ) .
Number of descriptors for every image may vary ( from few to many thousands)

When I want to search  for a similar image , I am extracting the descriptors
from it and create a query like :
MyImage:( desc1 desc2 ...  desc n )

Number of descriptors in query may also vary. Usual it is about 1000.

Of course performance of this query very bad and may take few minutes to
return .

Any ideas for performance improvement ?

P.s I also tried to use lire , but it is not fits my use case.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-of-Boolean-query-with-hundreds-of-OR-clauses-tp4153844.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Index not respecting Omit Norms

2014-08-19 Thread Chris Hostetter

: As you can see from the first image, the text field-type doesn't define the
: omitNorms flag, meaning it is set to false. Also on the first image you can
: see that the description field doesn't define the omitNorms flag, again
: meaning it is set to false. (Default for omitNorms is false). This can all
...
: I am having some issues understanding why some results have a fieldNorm set
: to 1 for matches on the description field. As you can see from the third
...
: From the first picture, can anyone tell me what each row (Properties, Schema
: and Index) refers to? I think the Properties row refers to the flags set
: when defining the Field Type, which for this field is text. The Schema row
: refers to the flags set when defining the field, which is description. I'm
: not as sure where the Index row flags come from, but I'm assuming it defines
: what the index is really representing?  
: Am I right in assuming the Omit Norms flag in the Index row of the first
: picture is what is causing fieldNorm issues in the second image?  
: If I am correct in the above question, how do I fix it?

From a quick glance at the UI JavaScript code (and the underlying 
LukeRequestHandler) I'm honestly not sure what the intended difference is 
between the Properties row and the Schema row.

I can tell you that the Index row represents what information about the 
field can actaully be extracted from the underlying index itself -- 
completely independently from the schema.   The fact that Omit Norms is 
checked in that row means that there is at least one document in your 
index that was indexed with omitNormws=true.

Most likely what happened is that you indexed a bunch of docs with 
omitNorms=true in your schema.xml, then later changed your schema to use 
norms, but those docs are still there in the index.



-Hoss
http://www.lucidworks.com/


Re: Replicating Between Solr Clouds

2014-08-19 Thread Jeff Wartes

I¹ve been working on this tool, which wraps the collections API to do more
advanced cluster-management operations:
https://github.com/whitepages/solrcloud_manager


One of the operations I¹ve added (copy) is a deployment mechanism that
uses the replication handler¹s snap puller to hot-load a pre-indexed
collection from one solrcloud cluster into another. You create the same
collection name with the same shard count in two clusters, index into one,
and copy from that into the other.

This method won¹t work as a method of active replication, since it copies
the whole index. If you only need a periodic copy between data centers
though, or want someplace to restore from in case of critical failure
(until you can properly rebuild), there might be something you can use
here. 




On 8/19/14, 12:45 PM, reparker23 reparke...@gmail.com wrote:

Are there any more OOB solutions for inter-SolrCloud replication now?  Our
indexing is so slow that we cannot rely on a complete re-index of data
from
our DB of record (SQL) to recover data in the Solr indices.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp41211
96p4153856.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: sample Cell schema question

2014-08-19 Thread Aman Tandon
Thanks Aurélien
On Aug 19, 2014 3:00 PM, jmlucjav jmluc...@gmail.com wrote:

 In the sample schema.xml I can see this:

 !-- Main body of document extracted by SolrCell.
 NOTE: This field is not indexed by default, since it is also
 copied to text
 using copyField below. This is to save space. Use this field
 for returning and
 highlighting document content. Use the text field to search
 the content. --
 field name=content type=text_general indexed=false stored=true
 multiValued=true/


 I am wondering, how does having this split in two fields text/content save
 space?



Re: logging in solr

2014-08-19 Thread Aman Tandon
As you are using tomcat you can configure the log file name, folder,etc. by
configuring the server.xml present in the Conf directory of tomcat.
On Aug 19, 2014 4:17 AM, Shawn Heisey s...@elyograg.org wrote:

 On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote:
  Currently in my component Solr is logging to catalina.out. What
 is the configuration needed to redirect those logs to some custom logfile
 eg: Solr.log.

 Solr uses the slf4j library for logging.  Simply change your program to
 use slf4j, and very likely the logs will go to the same place the Solr
 logs do.

 http://www.slf4j.org/manual.html

 See also the wiki page on logging jars and Solr:

 http://wiki.apache.org/solr/SolrLogging

 Thanks,
 Shawn




Re: Apache Solr Wiki

2014-08-19 Thread Erick Erickson
Done, have fun!

On Tue, Aug 19, 2014 at 10:07 AM,  julie.v...@anixter.com wrote:
 user name: julievoss



 From:   Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Date:   08/19/2014 10:34 AM
 Subject:Re: Apache Solr Wiki



 Julie:

 bq: Can I also have access to the wiki?

 Sure. Sou need to create a Wiki logon and let us
 know what that is before we can add you to the list.

 Best,
 Erick

 On Tue, Aug 19, 2014 at 6:54 AM,  julie.v...@anixter.com wrote:
 Can I also have access to the wiki? We are at the outset of a
 Solr/Hybris
 implementation.



 From:   Mark Sun mark...@motionelements.com
 To: solr-user@lucene.apache.org
 Date:   08/18/2014 08:06 PM
 Subject:Apache Solr Wiki



 Dear Solr Wiki admin,

 We are using Solr for our multilingual asian language keywords search,
 as
 well as visual similarity search engine (via pixolution plugin). We
 would
 like to update the Powered by Solr section. As well as help to add on to
 the knowledge base for other Solr setups.

 Can you add me, username MarkSun as a contributor to the wiki?

 Thank you!

 Cheers,
 Mark Sun
 CTO

 MotionElements Pte Ltd
 190 Middle Road, #10-05 Fortune Centre
 Singapore 188979
 mark...@motionelements.com

 www.motionelements.com
 =
 Asia-inspired Stock Animation | Video Footage l AE Template online
 marketplace
 =
 This message may contain confidential and/or privileged information.  If
 you are not the addressee or authorized to receive this for the
 addressee,
 you must not use, copy, disclose or take any action based on this
 message
 or any information herein. If you have received this message in error,
 please advise the sender immediately by reply e-mail and delete this
 message.  Thank you for your cooperation.




Re: Help with StopFilterFactory

2014-08-19 Thread Jack Krupansky

What release of Solr?

Do you have autoGeneratePhraseQueries=true on the field?

And when you said But any of these does, did you mean But NONE of these 
does?


-- Jack Krupansky

-Original Message- 
From: heaven

Sent: Tuesday, August 19, 2014 2:34 PM
To: solr-user@lucene.apache.org
Subject: Help with StopFilterFactory

Hi, I have the next text field:

fieldType name=words_ngram class=solr.TextField omitNorms=false
 analyzer
   tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
   filter class=solr.StopFilterFactory words=url_stopwords.txt
ignoreCase=true /
   filter class=solr.LowerCaseFilterFactory /
 /analyzer
/fieldType

url_stopwords.txt looks like:
http
https
ftp
www

So very simple. In index I have:
* twitter.com/testuser

All these queries do match:
* twitter.com/testuser
* com/testuser
* testuser

But any of these does:
* https://twitter.com/testuser
* https://www.twitter.com/testuser
* www.twitter.com/testuser

What do I do wrong? Analysis makes me think something is wrong with token
positions:
http://lucene.472066.n3.nabble.com/file/n4153839/oi7o69.jpg
but I was thinking StopFilterFactory is supposed to remove
https/http/ftw/www keywords. Why do they figure there at all? That doesn't
make much sense.

Regards,
Alexander



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Inconsistent Solr Index Behavior

2014-08-19 Thread Ethan
A while back we added a span support for multi-value fields and did a full
re-index for data spanning over 4 years.  It worked perfectly for a month,
and then suddenly results are not reliable anymore.  We are noticing that
the span is not working on most of the data and is returning wrong results.
 Is there anything that might corrupt or drop index data(old data)? Less
RAM? Something?

-E


Intermittent error indexing SolrCloud 4.7.0

2014-08-19 Thread S.L
Hi All,

I get No Live SolrServers available to handle this request error
intermittently while indexing in a SolrCloud cluster with 3 shards and
replication factor of 2.

I am using Solr 4.7.0.

Please see the stack trace below.

org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:352)
~[DynaOCrawlerUtils.jar:?]
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:640)
~[DynaOCrawlerUtils.jar:?]
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
~[DynaOCrawlerUtils.jar:?]
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
~[DynaOCrawlerUtils.jar:?]
at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
~[DynaOCrawlerUtils.jar:?]


Re: Integrating Solr with HBase Using Lily Project

2014-08-19 Thread rulinma
try cloudear search.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Integrating-Solr-with-HBase-Using-Lily-Project-tp4147868p4153906.html
Sent from the Solr - User mailing list archive at Nabble.com.