Re: Provide value to uniqueID

2014-06-09 Thread Shalin Shekhar Mangar
You can specify the file name as the id by adding a TemplateTransformer on
the entity x and specifying ${f.file} as the template value in the id
field. For example:

dataSource type=FileDataSource /

  document
entity name=f processor=FileListEntityProcessor
baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt
recursive=true rootEntity=false
  entity name=x processor=LineEntityProcessor
url=${f.fileAbsolutePath} transformer=TemplateTransformer
field column=rawLine name=category_name /
field name=id template=${f.file} /
  /entity
/entity
  /document




On Mon, Jun 9, 2014 at 11:23 AM, ienjreny ismaeel.enjr...@gmail.com wrote:

 Hello,

 I am using the following code to read text files

 dataSource type=FileDataSource /

   document
 entity name=f processor=FileListEntityProcessor
 baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt
 recursive=true rootEntity=false
   entity name=x processor=LineEntityProcessor
 url=${f.fileAbsolutePath}
 field column=rawLine name=category_name /
 field column=??? name=id /
   /entity
 /entity
   /document

 it is working perfect except the id value, how can I use file name (or any
 value) as value for uniqeuID field



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: Documents Added Not Available After Commit (Both Soft and Hard)

2014-06-09 Thread Shalin Shekhar Mangar
I think this may be the same bug as LUCENE-5289 which was fixed in 4.5.1.
Can you upgrade to 4.5.1 and see if that solves the problem?




On Fri, Jun 6, 2014 at 7:17 PM, Justin Sweeney justin.sweene...@gmail.com
wrote:

 Hi,

 An application I am working on indexes documents to a Solr index. This Solr
 index is setup as a single node, without any replication. This index is
 running Solr 4.5.0.

 We have noticed an issue lately that is causing some problems for our
 application. The problem is that we add/update a number of documents in the
 Solr index and we have the index setup to autoCommit (hard) once every 30
 minutes. In the Solr logs, I am able to see the add command to Solr and I
 can also see Solr start the hard commit. When this hard commit occurs, we
 see the following message:
 INFO  - 2014-06-04 20:13:55.135;
 org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
 Skipping IW.commit.

 This only happens sometimes, but Solr will go hours (we have seen 6-12
 hours of this behavior) before it does a hard commit where it find changes.
 After the hard commit where the changes are found, we are then able to
 search for and find the documents that were added hours ago, but up until
 that point the documents are not searchable.

 We tried enabling autoSoftCommit every 5 minutes in the hope that this
 would help, but we are seeing the same behavior.

 Here is a sampling of the logs showing this occurring (I've trimmed it down
 to just show what is happening):

 INFO  - 2014-06-05 20:00:41.300;
  org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection]
  webapp=/solr path=/update params={wt=javabinversion=2}
 {add=[359453225]} 0
  0
 
  INFO  - 2014-06-05 20:00:41.376;
  org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection]
  webapp=/solr path=/update params={wt=javabinversion=2}
 {add=[347170717]} 0
  1
 
  INFO  - 2014-06-05 20:00:51.527;
  org.apache.solr.update.DirectUpdateHandler2; start
 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 
  INFO  - 2014-06-05 20:00:51.533;
 org.apache.solr.search.SolrIndexSearcher;
  Opening Searcher@257c43d main
 
  INFO  - 2014-06-05 20:00:51.533;
  org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
 
  INFO  - 2014-06-05 20:00:51.545;
 org.apache.solr.core.QuerySenderListener;
  QuerySenderListener sending requests to Searcher@257c43d
  main{StandardDirectoryReader(segments_acl:1367002775953
  _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533
  _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139
  _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255
  _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556
  _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/10240
  _2gqy(4.5):C697/215 _2gr2(4.5):C878/352 _2gr7(4.5):C28135/11775
  _2gr9(4.5):C3276/1341 _2grb(4.5):C5/1 _2grc(4.5):C3247/1219
 _2grd(4.5):C6/1
  _2grf(4.5):C5/2 _2grg(4.5):C23659/10967 _2grh(4.5):C1 _2grj(4.5):C1
  _2grk(4.5):C5160/1482 _2grm(4.5):C1210/351 _2grn(4.5):C3957/1372
  _2gro(4.5):C7734/2207 _2grp(4.5):C220/36)}
 
  INFO  - 2014-06-05 20:00:51.546; org.apache.solr.core.SolrCore;
  [zoomCollection] webapp=null path=null
  params={event=newSearcherq=d_name:ibmdistrib=false} hits=38 status=0
  QTime=0
 
  INFO  - 2014-06-05 20:00:51.546;
 org.apache.solr.core.QuerySenderListener;
  QuerySenderListener done.
 
  INFO  - 2014-06-05 20:00:51.547; org.apache.solr.core.SolrCore;
  [zoomCollection] Registered new searcher Searcher@257c43d
  main{StandardDirectoryReader(segments_acl:1367002775953
  _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533
  _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139
  _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255
  _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556
  _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/10240
  _2gqy(4.5):C697/215 _2gr2(4.5):C878/352 _2gr7(4.5):C28135/11775
  _2gr9(4.5):C3276/1341 _2grb(4.5):C5/1 _2grc(4.5):C3247/1219
 _2grd(4.5):C6/1
  _2grf(4.5):C5/2 _2grg(4.5):C23659/10967 _2grh(4.5):C1 _2grj(4.5):C1
  _2grk(4.5):C5160/1482 _2grm(4.5):C1210/351 _2grn(4.5):C3957/1372
  _2gro(4.5):C7734/2207 _2grp(4.5):C220/36)}
 
  INFO  - 2014-06-05 20:01:10.557;
  org.apache.solr.update.DirectUpdateHandler2; start
 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 
  INFO  - 2014-06-05 20:01:10.559; org.apache.solr.core.SolrCore;
  [zoomCollection] webapp=/solr path=/select
 
 params={fl=d_ticker,d_location,d_id,d_source_count,d_xml_domain,d_cik,d_keyword_count,d_xml_name,d_xml_contact,d_main_domain,d_location_codestart=0q=d_domain:(
  www.northwestcollege.edu)wt=javabinversion=2rows=99} hits=4
  status=0 QTime=40
 
  INFO  - 2014-06-05 20:01:10.563;
 org.apache.solr.search.SolrIndexSearcher;
  Opening Searcher@69f90ad1 main
 
  INFO  - 

Re: Provide value to uniqueID

2014-06-09 Thread ienjreny
Thanks, it is working fine but I had to change the following line

field name=id template=${f.file} /

to

field column=id template=${f.file} /


On Mon, Jun 9, 2014 at 9:29 AM, Shalin Shekhar Mangar [via Lucene] 
ml-node+s472066n4140715...@n3.nabble.com wrote:

 You can specify the file name as the id by adding a TemplateTransformer on
 the entity x and specifying ${f.file} as the template value in the id
 field. For example:

 dataSource type=FileDataSource /

   document
 entity name=f processor=FileListEntityProcessor
 baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt
 recursive=true rootEntity=false
   entity name=x processor=LineEntityProcessor
 url=${f.fileAbsolutePath} transformer=TemplateTransformer
 field column=rawLine name=category_name /
 field name=id template=${f.file} /
   /entity
 /entity
   /document




 On Mon, Jun 9, 2014 at 11:23 AM, ienjreny [hidden email]
 http://user/SendEmail.jtp?type=nodenode=4140715i=0 wrote:

  Hello,
 
  I am using the following code to read text files
 
  dataSource type=FileDataSource /
 
document
  entity name=f processor=FileListEntityProcessor
  baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt
  recursive=true rootEntity=false
entity name=x processor=LineEntityProcessor
  url=${f.fileAbsolutePath}
  field column=rawLine name=category_name /
  field column=??? name=id /
/entity
  /entity
/document
 
  it is working perfect except the id value, how can I use file name (or
 any
  value) as value for uniqeuID field
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



 --
 Regards,
 Shalin Shekhar Mangar.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712p4140715.html
  To unsubscribe from Provide value to uniqueID, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4140712code=aXNtYWVlbC5lbmpyZW55QGdtYWlsLmNvbXw0MTQwNzEyfC01NTkxMjYzODg=
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712p4140725.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Alistair
Hello all,

I was wondering what does the onlyMorePopular option for spellchecking use
as its threshold? Will it always pick the suggestion that returns the most
queries or does it base its result based off of some threshold that can be
configured? 

Thanks!

Ali.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow performance on simple filter

2014-06-09 Thread mizayah
I'm really at dead point. 

Mine indeks is 5,6GM and about 8mln documments.
Field i'm using for filter is simple as  hell. 

  field name=class_name type=string indexed=true stored=true 
multiValued=false/

Can it be that other fields affect my search if i only do filter query?
solr/puls-objects-prod/select?q=*%3A*fq=class_name:License




mine results:
int name=QTime831/int
lst name=params
str name=q*:*/str
str name=fqclass_name:License/str
/lst
/lst
result name=response numFound=8655108 start=0



--
View this message in context: 
http://lucene.472066.n3.nabble.com/slow-performance-on-simple-filter-tp4135613p4140728.html
Sent from the Solr - User mailing list archive at Nabble.com.


writing logs of a speicific solr posting to a file

2014-06-09 Thread pshahukhal
Hi 
   I am using SimplepostTool to post the xml files to SOLR llke :

java  -Durl=http://localhost:8080/solr/collection1/update -jar
/var/lib/tomcat6/solr/collection1/dump/xmlinput/post.jar
/var/lib/tomcat6/solr/collection1/dump/xmlinput/solr.xml

   When there are certain errors ,the response from above command just shows
the 404 error or 500 server error but doesnt provide the complete log
details like in 
  http://localhost:8080/solr/#/~logging  or in catalina.out 
   I want to catch the exact log details that are thrown in  the logs when
the above command is executed and write to a file .I am wondering if there
are additional params that need to be passed in the command line or I have
to work in the configurations . 
   



--
View this message in context: 
http://lucene.472066.n3.nabble.com/writing-logs-of-a-speicific-solr-posting-to-a-file-tp4140730.html
Sent from the Solr - User mailing list archive at Nabble.com.


How Can I modify the DocList and DocSet in solr

2014-06-09 Thread Vishnu Mishra
I am using solr 4.6 and I am using solr Sharding (Distributed Search). I have
situation where I like to modify the solr search result (DocList and DocSet)
inside solr QueryComponent right after the following method is called from
process() method.
searcher.search(result, cmd);
  
Can I modify the DocList and DocSet after the search inside QueryComponent
and add it to the QueryResult.
Also can I make the DocList unsorted. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR Performance Benchmarking

2014-06-09 Thread Shalin Shekhar Mangar
To be of any help we'd need to know what your documents look like, what
your queries look like, what is the specifications of your server? How much
heap is dedicated to Solr, how much free memory is available for the OS
file cache. You have to figure out the bottleneck. Is it CPU or RAM or
Disk? Maybe it's excessive garbage collection? Turn on GC logging and look
at GC activity.


On Sun, Jun 8, 2014 at 11:39 PM, rashi gandhi gandhirash...@gmail.com
wrote:

 Hi,

 I am using SolrMeter for performance benchmarking. I am able to
 successfully test my solr setup up to 1000 queries per min while
 searching.
 But when I am exceeding this limit say 1500 search queries per min,
 facing Server Refused Connection in SOLR.
 Currently, I have only one solr server running on 64-bit 4 GB ram
 machine for testing.

 Please provide me some pointers , to optimize SOLR so that it can
 handle large number of request. (Specially more than 1000 request per
 min).
 Is there any change that I can do in solrconfig.xml or some other
 change to support this?


 Thanks in Advance





 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which
 is the property of Persistent Systems Ltd. It is intended only for the
 use of the individual or entity to which it is addressed. If you are
 not the intended recipient, you are not authorized to read, retain,
 copy, print, distribute or use this message. If you have received this
 communication in error, please notify the sender and delete all copies
 of this message. Persistent Systems Ltd. does not accept any liability
 for virus infected mails.




-- 
Regards,
Shalin Shekhar Mangar.


Large disjunction query practices

2014-06-09 Thread Joe Gresock
I'm wondering what the best practice for large disjunct queries in Solr is.
 A user wants to submit a query for several hundred thousand terms, like:
(term1 OR term2 OR ... term500,000)

I know it might be better to break this up into multiple queries that can
be merged on the user's end, but I'm wondering if there's guidance for a
good limit of OR'ed terms per query.  100 terms?  200? 500?  Any idea what
kinds of data set or memory limitations might govern this threshold?

Thanks,
Joe

-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13*


Re: Large disjunction query practices

2014-06-09 Thread Jack Krupansky
Are they expecting relevancy ranking or merely seeking to a bulk read of 
those documents? Please detail what the user is trying to accomplish with 
such a monster list of IDs.


Generally, queries of more than a few dozen terms are a bad idea. If for no 
other reason than that if you need to debug them or examine the results by 
hand, it will be a nightmare. OTOH, some people really love drama and just 
can't get enough of it.


The general guidance is to keep requests and responses relatively small. 
Keep network traffic down. Keep compute intensity down. Keep memory 
requirements down.


Small is better.

-- Jack Krupansky

-Original Message- 
From: Joe Gresock

Sent: Monday, June 9, 2014 8:50 AM
To: solr-user@lucene.apache.org
Subject: Large disjunction query practices

I'm wondering what the best practice for large disjunct queries in Solr is.
A user wants to submit a query for several hundred thousand terms, like:
(term1 OR term2 OR ... term500,000)

I know it might be better to break this up into multiple queries that can
be merged on the user's end, but I'm wondering if there's guidance for a
good limit of OR'ed terms per query.  100 terms?  200? 500?  Any idea what
kinds of data set or memory limitations might govern this threshold?

Thanks,
Joe

--
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.*-Philippians 4:12-13* 



Re: How Can I modify the DocList and DocSet in solr

2014-06-09 Thread Alexandre Rafalovitch
Can you make a custom Component? They are pluggable.

Regards,
 Alex
On 09/06/2014 6:24 pm, Vishnu Mishra vdil...@gmail.com wrote:

 I am using solr 4.6 and I am using solr Sharding (Distributed Search). I
 have
 situation where I like to modify the solr search result (DocList and
 DocSet)
 inside solr QueryComponent right after the following method is called from
 process() method.
 searcher.search(result, cmd);

 Can I modify the DocList and DocSet after the search inside QueryComponent
 and add it to the QueryResult.
 Also can I make the DocList unsorted.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Customizing Solr; Where to draw the line?

2014-06-09 Thread Jorge Luis Betancourt Gonzalez
I’ve certainly go for the 2nd option. Depending of what you need you won’t need 
to modify Solr itself but extend it using different plugins for what you need. 
You’ll need to write different components depending on your specific 
requirements. I definitely recommend the talks from Trey Grainger, from 
CareerBuilder. I remember seeing in some of the talks they have A/B testing 
built into Solr, and a lot of other “crazy” things, so it would be a good 
starting point, and it will provide a look on what you could accomplish by 
extending Solr.

Of course you’ll need to update your source between big releases of Solr, and 
perhaps between some minor ones, but this way you don’t need to worry about the 
latency or maintain a new search layer between the client and Solr. 

I hope it helps,

On Jun 8, 2014, at 10:38 PM, Phanindra R phani...@gmail.com wrote:

 Hi,
 
 We have decided to migrate from Lucene 3.x to latest Solr. A lot of
 architectural discussions are going on. There are two possible approaches.
 
 Please note that our customer-facing app (or any client) and Search are
 hosted on different machines.
 
 *1) Have a clean architecture*
- Solr takes care of customized search only.
 
   - We certainly have to override some filtering, scoring,etc.
 
- There will be an intermediary search-app that
 
   - receives queries
  - does a/b testing assignments, and other non-search stuff.
  - does query expansion / rewriting (to avoid every Solr shard doing
  that)
  - transforms query into Solr syntax and uses Solr's http API to
  consume it.
  - returns the response to customer-facing app or whatever the client
  is.
 
   The problem with this approach is the additional layer and the latency
 between search-app and solr. The client of search has to make an API call,
 across the network, to the intermediary search-app which in turns makes
 another Http API call to Solr.
 
 *2) Customize Solr to the full extent*
 
   - Do all the crazy stuff within Solr.
   - We can literally create a new url and register a handler class to
   process that. With some limitations, we should be able to do almost
   anything.
 
 The benefit of this approach is that it obviates the additional layer
 and the latency. However, I see a lot of long-term problems like hard to
 upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.).
 
 How about a distributed search? Where do above approaches stand?
 
 I understand that this is a subjective question. It'd be helpful if you
 could share your thoughts and experiences.
 
 Thanks.

VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 
2014. Ver www.uci.cu


Re: Solr Scale Toolkit Access Denied Error

2014-06-09 Thread Mark Gershman
Thanks, Tim.  Worked like a charm. Appreciate your timely assistance.


On Sat, Jun 7, 2014 at 9:13 PM, Timothy Potter thelabd...@gmail.com wrote:

 Hi Mark,

 Sorry for the trouble! I've now made the ami-1e6b9d76 AMI public;
 total oversight on my part :-(. Please try again. Thanks Hoss for
 trying to help out on this one.

 Cheers,
 Tim

 On Fri, Jun 6, 2014 at 6:46 PM, Mark Gershman montan...@gmail.com wrote:
  Thanks, Hoss.
 
  I did substitute the previous AMI ID from the mid-May release of the
  toolkit and the build process does proceed further; however, it appears
 the
  the AMI changed enough that it is not compatible with the new toolkit
  release.  In doing a little more research, I'm inclined to believe that
 the
  permissions on the AMI may be the source of the problem and will post to
  the issue tracker per your suggestion.
 
 
  Mark Gershman
 
 
  On Fri, Jun 6, 2014 at 7:41 PM, Chris Hostetter 
 hossman_luc...@fucit.org
  wrote:
 
 
  : My guess is that the customized toolkit AMI (ami-1e6b9d76) at AWS is
 not
  : accessible by my AWS credentials.  Is this an AMI permissioning issue
 or
  is
  : it a problem with my particular account or how it is configured at
 AWS.
   I
  : did not experience this specific problem when working with the
 previous
  : iteration of the Solr Scale Toolkit back toward the latter part of
 May.
   It
  : appears that the AMI was updated from ami-96779efe to ami-1e6b9d76
 with
  the
  : newest version of the toolkit.
 
  I'm not much of an AWS expert, but i seem to recall that if you don't
  have your AWS security group setup properly this type of error can
  happen? is it possible that when you were trying out solr-scale-tk
 before
  you had this setup, but now you don't?
 
  https://github.com/LucidWorks/solr-scale-tk
 
   You'll need to setup a security group named solr-scale-tk (or update
 the
   fabfile.py to change the name).
  
   At a minimum you should allow TCP traffic to ports: 8983, 8984-8989,
   SSH, and 2181 (ZooKeeper). However, it is your responsibility to
 review
   the security configuration of your cluster and lock it down
  appropriately.
  
   You'll also need to create an keypair (using the Amazon console) named
   solr-scale-tk (you can rename the key used by the framework, see:
   AWS_KEY_NAME). After downloading the keypair file (solr-scale-tk.pem),
   save it to ~/.ssh/ and change permissions: chmod 600
   ~/.ssh/solr-scale-tk.pem
 
  ...if I'm wrong, and there really is a problem with the security on the
  AMI, the best place to report that would be in the project's issue
  tracker...
 
  https://github.com/LucidWorks/solr-scale-tk/issues
 
 
 
  -Hoss
  http://www.lucidworks.com/
 



RE: Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Dyer, James
I believe it will return the terms that are most similar to the queried terms 
but have a greater term frequency than the queried terms.  It doesn't actually 
care what the term frequencies are, only that they are greater than the 
frequencies of the terms you queried on.

I do not know your use case, but you may want to consider using 
spellcheck.alternativeTermCount instead of onlyMorePopular.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
and 
https://issues.apache.org/jira/browse/SOLR-2585?focusedCommentId=13096153page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096153
 for why.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alistair [mailto:ali...@gmail.com] 
Sent: Monday, June 09, 2014 3:06 AM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck - onlyMorePopular threshold?

Hello all,

I was wondering what does the onlyMorePopular option for spellchecking use
as its threshold? Will it always pick the suggestion that returns the most
queries or does it base its result based off of some threshold that can be
configured? 

Thanks!

Ali.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html
Sent from the Solr - User mailing list archive at Nabble.com.




Collection communication internally

2014-06-09 Thread Vineet Mishra
Hi All,

I was curious to know how multiple Collection communication be achieved? If
yes then by what means.

The use case says, having multiple collection I need to query the first
collection and get the unique ids from first collection to query the second
one(Foreign Key Relation). Now if the no. of terms to be passed to second
collection is relatively small then its fine otherwise the problem arise,
as adding them to the query is little time consuming in sense of building
the query, querying to solr and waiting for the result to respond back.

So the query would look something like -

http://localhost:7070/solr/mycollection/select?q=
http://localhost:7070/solr/recollection/select?q=*:*fl=idsort=id_S%20descID:(
1 OR 2 OR ... OR 10)fl=*

So for the above form of query where the query terms are expanding
vigorously I was looking out for some solution where the collections can
internally resolve the query and fetch the resultant output.

Thanks!


Re: Any way to view lucene files

2014-06-09 Thread Aman Tandon
No, Anyways thanks Alex, but where is the luke jar?

With Regards
Aman Tandon


On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Have you looked at:
 https://github.com/DmitryKey/luke

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon amantandon...@gmail.com
 wrote:
  I guess this is not available now. I am trying to download from the
 google,
  please take a look https://code.google.com/p/luke/downloads/list
 
  If you have any link please share
 
  With Regards
  Aman Tandon
 
 
  On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire shiresum...@gmail.com
 wrote:
 
 
  Did u try  luke 47
 
 
 
   On Jun 6, 2014, at 11:59 PM, Aman Tandon amantandon...@gmail.com
  wrote:
  
   I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA
  
   but got this error:
   java.lang.IllegalArgumentException: A SPI class of type
   org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist.
 You
   need to add the corresponding JAR file supporting this SPI to your
   classpath.The current classpath supports the following names:
 [Lucene40,
   Lucene3x, SimpleText, Appending]
  
   With Regards
   Aman Tandon
  
  
   On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.com
 
   wrote:
  
   My solr version is 4.8.1 and luke is 3.5
  
   With Regards
   Aman Tandon
  
  
   On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins ch...@geekychris.com
 
   wrote:
  
   What version of Solr / Lucene are you using?  You have to match the
  Luke
   version to the same version of Lucene.
  
   C
   On Jun 6, 2014, at 11:42 PM, Aman Tandon amantandon...@gmail.com
  wrote:
  
   Yes  tried, but it not working at all every time i choose my index
   directory it shows me EOF past
  
   With Regards
   Aman Tandon
  
  
   On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins 
 ch...@geekychris.com
  
   wrote:
  
   Have you tried:
  
   https://code.google.com/p/luke/
  
   Best
  
   Chris
   On Jun 6, 2014, at 11:24 PM, Aman Tandon amantandon...@gmail.com
 
   wrote:
  
   Hi,
  
   Is there any way so that i can view what information and which is
   there
   in
   my _e.fnm, etc files. may be with the help of any application or
 any
   viewer
   tool.
  
   With Regards
   Aman Tandon
  
 



Re: ANN: Solr Next

2014-06-09 Thread Yonik Seeley
On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley ysee...@gmail.com wrote:
[...]
 Next major feature: Native Code Optimizations.
 In addition to moving more large data structures off-heap(like
 UnInvertedField?), I am planning to implement native code
 optimizations for certain hotspots.  Native code faceting would be an
 obvious first choice since it can often be a CPU bottleneck.

It's in!  Abbreviated report: 2x performance increase over stock solr
faceting (which is already fast!)
http://heliosearch.org/native-code-faceting/

-Yonik
http://heliosearch.org -- making solr shine

 Project resources:

 https://github.com/Heliosearch/heliosearch

 https://groups.google.com/forum/#!forum/heliosearch
 https://groups.google.com/forum/#!forum/heliosearch-dev

 Freenode IRC: #heliosearch #heliosearch-dev

 -Yonik


Re: Any way to view lucene files

2014-06-09 Thread François Schiettecatte
Just click the 'Releases' link:

https://github.com/DmitryKey/luke/releases

François

On Jun 9, 2014, at 10:43 AM, Aman Tandon amantandon...@gmail.com wrote:

 No, Anyways thanks Alex, but where is the luke jar?
 
 With Regards
 Aman Tandon
 
 
 On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:
 
 Have you looked at:
 https://github.com/DmitryKey/luke
 
 Regards,
   Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency
 
 
 On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon amantandon...@gmail.com
 wrote:
 I guess this is not available now. I am trying to download from the
 google,
 please take a look https://code.google.com/p/luke/downloads/list
 
 If you have any link please share
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire shiresum...@gmail.com
 wrote:
 
 
 Did u try  luke 47
 
 
 
 On Jun 6, 2014, at 11:59 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA
 
 but got this error:
 java.lang.IllegalArgumentException: A SPI class of type
 org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist.
 You
 need to add the corresponding JAR file supporting this SPI to your
 classpath.The current classpath supports the following names:
 [Lucene40,
 Lucene3x, SimpleText, Appending]
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.com
 
 wrote:
 
 My solr version is 4.8.1 and luke is 3.5
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins ch...@geekychris.com
 
 wrote:
 
 What version of Solr / Lucene are you using?  You have to match the
 Luke
 version to the same version of Lucene.
 
 C
 On Jun 6, 2014, at 11:42 PM, Aman Tandon amantandon...@gmail.com
 wrote:
 
 Yes  tried, but it not working at all every time i choose my index
 directory it shows me EOF past
 
 With Regards
 Aman Tandon
 
 
 On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins 
 ch...@geekychris.com
 
 wrote:
 
 Have you tried:
 
 https://code.google.com/p/luke/
 
 Best
 
 Chris
 On Jun 6, 2014, at 11:24 PM, Aman Tandon amantandon...@gmail.com
 
 wrote:
 
 Hi,
 
 Is there any way so that i can view what information and which is
 there
 in
 my _e.fnm, etc files. may be with the help of any application or
 any
 viewer
 tool.
 
 With Regards
 Aman Tandon
 
 
 



Re: Setup a Solr Cloud on a set of powerful machines

2014-06-09 Thread Erick Erickson
Well, you've omitted information about the most precious resource for
Solr, memory.

That said, this question is impossible to answer in the abstract, see:

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Sun, Jun 8, 2014 at 3:17 PM, shushuai zhu ss...@yahoo.com.invalid wrote:
 Hi,

 I would like to get some advice to setup a Solr Cloud on a set of powerful 
 machines. The average size of the documents handled by the Solr Cloud is 
 about 0.5 KB, and the number of documents stored in Solr Cloud could reach 
 billions. When indexing, the incoming document rate could be as high as 
 20k/second; and the major query operations performed on the Cloud are 
 searching, faceting, and some other aggregations. There will NOT be many 
 concurrent queries (replication factor of 2 may be good enough), but some 
 queries could cover big range of documents.

 As an example, I have 8 powerful machines (nodes), and each machine (node) 
 has:

 16 CPU cores
 256GB RAM
 48TB physical disk space

 The Solr Cloud may be setup in following different ways (assuming replication 
 factor is 2):

 1) 8 shards on 8 Solr servers, total 16 cores (including replicas)
 Each machine (node) holds one Solr server (JVM), and each Solr server has one 
 shard.

 2) 32 shards on 8 Solr servers, total 64 cores (including replicas)
 Each machine (node) holds one Solr server (JVM), and each Solr server has 4 
 shards.

 3) 32 shards on 16 Solr servers, total 64 cores (including replicas)
 Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 
 shards.

 4) 64 shards on 16 Solr servers, total 128 cores (including replicas)
 Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 
 shards.

 5) 128 shards on 32 Solr servers, total 256 cores (including replicas)
 Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 
 shards.

 Could someone advice which layout is better? Or you have some other better 
 layout? The basic idea is to divide a powerful machine to have more Solr 
 Servers and/or more shards. I would like to get some advice about the 
 trade-offs and general guidelines about the division. It would be very 
 helpful if you can advice an example setup for this use case.

 Thanks a lot.

 Shushuai


Re: Any way to view lucene files

2014-06-09 Thread Aman Tandon
Yeah just got it thanks Fracois :)

With Regards
Aman Tandon


On Mon, Jun 9, 2014 at 8:20 PM, François Schiettecatte 
fschietteca...@gmail.com wrote:

 Just click the 'Releases' link:

 https://github.com/DmitryKey/luke/releases

 François

 On Jun 9, 2014, at 10:43 AM, Aman Tandon amantandon...@gmail.com wrote:

  No, Anyways thanks Alex, but where is the luke jar?
 
  With Regards
  Aman Tandon
 
 
  On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  Have you looked at:
  https://github.com/DmitryKey/luke
 
  Regards,
Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon amantandon...@gmail.com
  wrote:
  I guess this is not available now. I am trying to download from the
  google,
  please take a look https://code.google.com/p/luke/downloads/list
 
  If you have any link please share
 
  With Regards
  Aman Tandon
 
 
  On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire shiresum...@gmail.com
  wrote:
 
 
  Did u try  luke 47
 
 
 
  On Jun 6, 2014, at 11:59 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
  I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA
 
  but got this error:
  java.lang.IllegalArgumentException: A SPI class of type
  org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist.
  You
  need to add the corresponding JAR file supporting this SPI to your
  classpath.The current classpath supports the following names:
  [Lucene40,
  Lucene3x, SimpleText, Appending]
 
  With Regards
  Aman Tandon
 
 
  On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon 
 amantandon...@gmail.com
 
  wrote:
 
  My solr version is 4.8.1 and luke is 3.5
 
  With Regards
  Aman Tandon
 
 
  On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins 
 ch...@geekychris.com
 
  wrote:
 
  What version of Solr / Lucene are you using?  You have to match the
  Luke
  version to the same version of Lucene.
 
  C
  On Jun 6, 2014, at 11:42 PM, Aman Tandon amantandon...@gmail.com
 
  wrote:
 
  Yes  tried, but it not working at all every time i choose my index
  directory it shows me EOF past
 
  With Regards
  Aman Tandon
 
 
  On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins 
  ch...@geekychris.com
 
  wrote:
 
  Have you tried:
 
  https://code.google.com/p/luke/
 
  Best
 
  Chris
  On Jun 6, 2014, at 11:24 PM, Aman Tandon 
 amantandon...@gmail.com
 
  wrote:
 
  Hi,
 
  Is there any way so that i can view what information and which
 is
  there
  in
  my _e.fnm, etc files. may be with the help of any application or
  any
  viewer
  tool.
 
  With Regards
  Aman Tandon
 
 
 




Re: Deepy nested structure

2014-06-09 Thread harikrishna
Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deepy-nested-structure-tp4140397p4140802.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deepy nested structure

2014-06-09 Thread harikrishna
thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deepy-nested-structure-tp4140397p4140803.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection communication internally

2014-06-09 Thread Erick Erickson
My first answer is don't do it that way :).

Solr works best with flattened (de-normlized) data. If at all
possible, you _really_ would be better off combining the two
collections and flattening the data even though there would be more
data.

Whenever I see a question like this, I wonder if you're trying to use
Solr like a DB, in this case with collections substituting for
tables, and this is almost always a mistake.

If you really must do this, consider cross-core joins if at all
possible, but I don't think this is supported yet for distributed
setups.

Best,
Erick

On Mon, Jun 9, 2014 at 7:32 AM, Vineet Mishra clearmido...@gmail.com wrote:
 Hi All,

 I was curious to know how multiple Collection communication be achieved? If
 yes then by what means.

 The use case says, having multiple collection I need to query the first
 collection and get the unique ids from first collection to query the second
 one(Foreign Key Relation). Now if the no. of terms to be passed to second
 collection is relatively small then its fine otherwise the problem arise,
 as adding them to the query is little time consuming in sense of building
 the query, querying to solr and waiting for the result to respond back.

 So the query would look something like -

 http://localhost:7070/solr/mycollection/select?q=
 http://localhost:7070/solr/recollection/select?q=*:*fl=idsort=id_S%20descID:(
 1 OR 2 OR ... OR 10)fl=*

 So for the above form of query where the query terms are expanding
 vigorously I was looking out for some solution where the collections can
 internally resolve the query and fetch the resultant output.

 Thanks!


How use gorup and facet ?

2014-06-09 Thread Phi Hoang Hai
Dear Solr expert.
I have 2 problems need your help.
1) I have to group list with group.limit=1group.main=truegroup.sort=Date
desc (many group and each group has 1 element is newest). Then from list
group (each group has 1 element), I want to filter in order to remove items
(in groups) not matches condition. Could you tell me the way with 1
statement in order to query ?
2) How could I facet and show all records of each facet with 1 statement ?

Thanks you.

Hai


Re: Setup a Solr Cloud on a set of powerful machines

2014-06-09 Thread Shawn Heisey
On 6/8/2014 4:17 PM, shushuai zhu wrote:
 I would like to get some advice to setup a Solr Cloud on a set of powerful 
 machines. The average size of the documents handled by the Solr Cloud is 
 about 0.5 KB, and the number of documents stored in Solr Cloud could reach 
 billions. When indexing, the incoming document rate could be as high as 
 20k/second; and the major query operations performed on the Cloud are 
 searching, faceting, and some other aggregations. There will NOT be many 
 concurrent queries (replication factor of 2 may be good enough), but some 
 queries could cover big range of documents.

 As an example, I have 8 powerful machines (nodes), and each machine (node) 
 has:

 16 CPU cores
 256GB RAM
 48TB physical disk space

 The Solr Cloud may be setup in following different ways (assuming replication 
 factor is 2):

 1) 8 shards on 8 Solr servers, total 16 cores (including replicas)
 Each machine (node) holds one Solr server (JVM), and each Solr server has one 
 shard. 

 2) 32 shards on 8 Solr servers, total 64 cores (including replicas)
 Each machine (node) holds one Solr server (JVM), and each Solr server has 4 
 shards. 

 3) 32 shards on 16 Solr servers, total 64 cores (including replicas)
 Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 
 shards.

 4) 64 shards on 16 Solr servers, total 128 cores (including replicas)
 Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 
 shards.

 5) 128 shards on 32 Solr servers, total 256 cores (including replicas)
 Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 
 shards.

Erick's note is very important.  From the information given, we can't
even guess about the size of your index.  Even if we had that
information, there are too many variables to give you any real
recommendations.

Also mentioned by Erick:  RAM is the single greatest factor affecting
Solr performance.  If you have enough OS disk cache to fit your index
entirely in RAM, performance is likely to be excellent.  With 256GB of
RAM on eight servers, you're going to have about 2TB of RAM, some of
which will be used for Solr itself.  If both copies of your index take
up 2TB or less in disk space, you're probably going to be OK there. 
You'd probably be OK up to about 3TB of total index.

The 48TB of disk space is probably serious overkill.  I would assume
this is twelve 4TB drives.  It would be better for performance (without
losing redundancy) to use RAID10 with a stripe size of at least 1MB for
the storage instead of any other RAID level.  It eats up half your raw
space for redundancy, but the performance is *excellent*.

The fact that your query volume will be low does give me the ability to
tell you one thing: With 16 CPU cores per machine and a low query
volume, you'll be able to handle a lot more Solr cores per machine.  The
extra CPU cores can spend their time reading from Solr cores and
speeding up each individual query without worrying about being crushed
under hundreds of queries per second.

For a perfect match of CPU cores to Solr cores, you'd do option number
4, so each machine would get 16 Solr cores ... but I think option number
3 might be better, so you have more CPUs than indexes per machine.  This
gives you a safe capacity of about 32 billion documents, with a maximum
total capacity of well over 64 billion documents.

Thanks,
Shawn



Re: SOLR Performance Benchmarking

2014-06-09 Thread Shawn Heisey
On 6/8/2014 12:09 PM, rashi gandhi wrote:
 I am using SolrMeter for performance benchmarking. I am able to
 successfully test my solr setup up to 1000 queries per min while
 searching.
 But when I am exceeding this limit say 1500 search queries per min,
 facing Server Refused Connection in SOLR.
 Currently, I have only one solr server running on 64-bit 4 GB ram
 machine for testing.

 Please provide me some pointers , to optimize SOLR so that it can
 handle large number of request. (Specially more than 1000 request per
 min).
 Is there any change that I can do in solrconfig.xml or some other
 change to support this?

This sounds like your servlet container is configured to limit the
number of threads that can be started.  I would bet that you are using a
packaged Tomcat or Jetty install rather than the Jetty included in the
Solr example, and that it has maxThreads set to the default value of
200.  Solr tends to start a lot of threads internally simply for normal
operation.  If your servlet container is set to limit the number of
total threads to 200 (with a default queue of 100 connections beyond the
200 threads) and you reach the limit because your connection rate is
high, then new connections will be refused.

The Solr example has its servlet container configured to allow ten
thousand threads, so it almost never has this problem.

You'll need to find the documentation for your servlet container and
look there for information on how to increase maxThreads.

For Tomcat 7, the Introduction in the HTTP Connector part of the
documentation mentions the problem:

http://tomcat.apache.org/tomcat-7.0-doc/config/http.html#Introduction

You may also need to increase the amount of RAM in the server (or change
the configuration to reduce heap requirements) to avoid performance
problems that cause each individual query to be slow:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



RE: COMMERCIAL: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-06-09 Thread Ronald Matamoros
Hi Chris,

Created ticket https://issues.apache.org/jira/browse/SOLR-6154
Included to the ticket the data.xml and a PDF with instructions on how to 
replicate.

Sending different updates to different ports was just how the confluence 
tutorial made the steps; it does not affect the result of the test

As soon as I have more information will post to the ticket.
Appreciate the interest, let me know about any suggestion or feedback  

Thank you
Ronald Matamoros


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 06 June 2014 22:00
To: solr-user@lucene.apache.org
Subject: COMMERCIAL: RE: SolrCloud: facet range option 
f.field.facet.mincount=1 omits buckets on response



Ronald: I'm having a little trouble understading the  steps o reproduce that 
you are describing -- in particular Step 1 f ii because i'm not really sure i 
understand what exactly you are putting in mem2.xml

Also: Since you don't appera to be using implicit routing, i'm not clear on why 
you are explicitly sending differnet updates to different ports in Step 1 f i 
-- does that affect the results of your test?


If you can reliably reproduce using modified data from the example, could you 
please open a Jira outline these steps and atached the modified data to index 
directly to that issue?  (FWIW: If it doesn't matter what port you use to send 
which documents, then you should be able to create a single unified data.xml 
file containing all the docs to index in a single
command)



: Date: Thu, 29 May 2014 18:06:38 +
: From: Ronald Matamoros rmatamo...@searchtechnologies.com
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org solr-user@lucene.apache.org
: Subject: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits
: buckets on response
: 
: Hi all,
: 
: At the moment I am reviewing the code to determine if this is a legitimate 
bug that needs to be set as a JIRA ticket.
: Any insight or recommendation is appreciated.
: 
: Including the replication steps as text:
: 
: -
: Solr versions where issue was replicated.
:   * 4.5.1 (Linux)
:   * 4.8.1 (Windows + Cygwin)
: 
: Replicating
: 
:   1. Created two-shard environment - no replication 
:  
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
: 
:  a. Download Solr distribution from 
http://lucene.apache.org/solr/downloads.html 
:  b. Unzipped solr-4.8.1.zip to a temporary location: SOLR_DIST_HOME 
:  c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar
:  d. Create nodes
:   i. cd SOLR_DIST_HOME
:   ii. Via Windows Explorer copied example to node1
:   iii. Via Windows Explorer copied example to node2
: 
:  e. Start Nodes 
:   i. Start node 1
: 
:cd node1
:java -DzkRun -DnumShards=2 
-Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar 
start.jar
: 
:   ii. Start node 2
: 
:cd node2
:java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
: 
:  f. Fed sample documents
:   i. Out of the box
: 
:curl http://localhost:8983/solr/update?commit=true -H 
Content-Type: text/xml -d @mem.xml
:curl http://localhost:7574/solr/update?commit=true -H 
Content-Type: text/xml -d @monitor2.xml
: 
:   ii. Create a copy of mem.xml to mem2.xml; modified identifiers, 
names, prices and fed
: 
:curl http://localhost:8983/solr/update?commit=true -H 
Content-Type: text/xml -d @mem2.xml
: 
:add
:  doc
:field name=idCOMPANY1/field
:field name=nameCOMPANY1 Device/field
:field name=manuCOMPANY1 Device Mfg/field
:.
:field name=price190/field
:.
:  /doc
:  doc
:field name=idCOMPANY2/field
:field name=nameCOMPANY2 flatscreen/field
:field name=manuCOMPANY2 Device Mfg./field
:.
:field name=price200.00/field
:.
:  /doc
:  doc
:field name=idCOMPANY3/field
:field name=nameCOMPANY3 Laptop/field
:field name=manuCOMPANY3 Device Mfg./field
:.
:field name=price800.00/field
:.
:  /doc
:  
:  /add
: 
:   2. Query **without** f.price.facet.mincount=1, counts and buckets are OK
: 
:  

accessing individual elements of a multivalued field

2014-06-09 Thread kritarth.anand
hi,

prod: p
cat : catA,catB,catC

prod :q 
cat : catB, catC,catD

My schema consists of documents with uid : 'prod's and then they belong can
to multiple categories called 'cat' and which are represented as a
multivalued field. For a particular kind of query I need to access
individual elements separately as in 

return prod where  (cat_1 == catA) or (cat_2==catB). is there a way by which
i can do that?

thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr4 optimization

2014-06-09 Thread Joshi, Shital
 Hi,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. On some of the 
boxes we have about 5 million deleted docs and we have never run optimization 
since beginning. Does number of deleted docs have anything to do with 
performance of query? Should we consider optimization at all if we're not 
worried about disk space?

Thanks!




SolrCloud collection create / delete failure

2014-06-09 Thread John Smodic
Hey guys,

I'm trying to simply create collection foo in SolrCloud (to a collection 
that failed to create once due to a badly formatted schema).

I try the following:

createCollection foo - could not create a new core 
solr/foo_shard1_replica1 as another core is already defined there
deleteCollection foo - could not find collection foo
unload core foo_shard1_replica1 and delete data dir - no such core exists 
'foo_shard1_replica1'

The directory 'foo_shard1_replica1' exists in my /solr directory.

How can I recover out of this state without manually deleting the 
directory and/or wiping out my ZK?

Thanks,
John

Re: solr4 optimization

2014-06-09 Thread Otis Gospodnetic
Hi,

I don't remember last time I ran optimize.  Sure, yes, things will work
faster if you optimize an index and reduce the number of segments, but if
you are regularly writing to that index and performance is OK, leave it to
Lucene segment merges to purge deletes.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Mon, Jun 9, 2014 at 4:15 PM, Joshi, Shital shital.jo...@gs.com wrote:

  Hi,

 We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. On some
 of the boxes we have about 5 million deleted docs and we have never run
 optimization since beginning. Does number of deleted docs have anything to
 do with performance of query? Should we consider optimization at all if
 we're not worried about disk space?

 Thanks!





Re: Setup a Solr Cloud on a set of powerful machines

2014-06-09 Thread Gili Nachum
 the incoming document rate could be as high as 20k/second...
That sounds like a lot of CPU eager indexing work, given the 128 CPU cores
available, from indexing speed perspective: would you recommend having a
similar number of solr cores created, or Solr does just a when several with
a small number of Solr cores, having several CPU cores per Solr core, as
indexing is multi-threaded?


On Mon, Jun 9, 2014 at 7:19 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/8/2014 4:17 PM, shushuai zhu wrote:
  I would like to get some advice to setup a Solr Cloud on a set of
 powerful machines. The average size of the documents handled by the Solr
 Cloud is about 0.5 KB, and the number of documents stored in Solr Cloud
 could reach billions. When indexing, the incoming document rate could be as
 high as 20k/second; and the major query operations performed on the Cloud
 are searching, faceting, and some other aggregations. There will NOT be
 many concurrent queries (replication factor of 2 may be good enough), but
 some queries could cover big range of documents.
 
  As an example, I have 8 powerful machines (nodes), and each machine
 (node) has:
 
  16 CPU cores
  256GB RAM
  48TB physical disk space
 
  The Solr Cloud may be setup in following different ways (assuming
 replication factor is 2):
 
  1) 8 shards on 8 Solr servers, total 16 cores (including replicas)
  Each machine (node) holds one Solr server (JVM), and each Solr server
 has one shard.
 
  2) 32 shards on 8 Solr servers, total 64 cores (including replicas)
  Each machine (node) holds one Solr server (JVM), and each Solr server
 has 4 shards.
 
  3) 32 shards on 16 Solr servers, total 64 cores (including replicas)
  Each machine (node) holds 2 Solr servers (JVMs), and each Solr server
 has 2 shards.
 
  4) 64 shards on 16 Solr servers, total 128 cores (including replicas)
  Each machine (node) holds 2 Solr servers (JVMs), and each Solr server
 has 4 shards.
 
  5) 128 shards on 32 Solr servers, total 256 cores (including replicas)
  Each machine (node) holds 4 Solr servers (JVMs), and each Solr server
 has 4 shards.

 Erick's note is very important.  From the information given, we can't
 even guess about the size of your index.  Even if we had that
 information, there are too many variables to give you any real
 recommendations.

 Also mentioned by Erick:  RAM is the single greatest factor affecting
 Solr performance.  If you have enough OS disk cache to fit your index
 entirely in RAM, performance is likely to be excellent.  With 256GB of
 RAM on eight servers, you're going to have about 2TB of RAM, some of
 which will be used for Solr itself.  If both copies of your index take
 up 2TB or less in disk space, you're probably going to be OK there.
 You'd probably be OK up to about 3TB of total index.

 The 48TB of disk space is probably serious overkill.  I would assume
 this is twelve 4TB drives.  It would be better for performance (without
 losing redundancy) to use RAID10 with a stripe size of at least 1MB for
 the storage instead of any other RAID level.  It eats up half your raw
 space for redundancy, but the performance is *excellent*.

 The fact that your query volume will be low does give me the ability to
 tell you one thing: With 16 CPU cores per machine and a low query
 volume, you'll be able to handle a lot more Solr cores per machine.  The
 extra CPU cores can spend their time reading from Solr cores and
 speeding up each individual query without worrying about being crushed
 under hundreds of queries per second.

 For a perfect match of CPU cores to Solr cores, you'd do option number
 4, so each machine would get 16 Solr cores ... but I think option number
 3 might be better, so you have more CPUs than indexes per machine.  This
 gives you a safe capacity of about 32 billion documents, with a maximum
 total capacity of well over 64 billion documents.

 Thanks,
 Shawn




Re: writing logs of a speicific solr posting to a file

2014-06-09 Thread Sameer Maggon
Check out the patch on the issue below. We hit the same issue and posted a
patch, none of the committers have picked it up yet, but would be good to
get some feedback on it and get this into the next dot release. If it works
for you, please vote it up.

https://issues.apache.org/jira/browse/SOLR-5940

Thanks,
-- 
*Sameer Maggon*
Founder | Measured Search
http://measuredsearch.com



On Mon, Jun 9, 2014 at 3:48 AM, pshahukhal pshahuk...@gmail.com wrote:

 Hi
I am using SimplepostTool to post the xml files to SOLR llke :

 java  -Durl=http://localhost:8080/solr/collection1/update -jar
 /var/lib/tomcat6/solr/collection1/dump/xmlinput/post.jar
 /var/lib/tomcat6/solr/collection1/dump/xmlinput/solr.xml

When there are certain errors ,the response from above command just
 shows
 the 404 error or 500 server error but doesnt provide the complete log
 details like in
   http://localhost:8080/solr/#/~logging  or in catalina.out
I want to catch the exact log details that are thrown in  the logs when
 the above command is executed and write to a file .I am wondering if there
 are additional params that need to be passed in the command line or I have
 to work in the configurations .




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/writing-logs-of-a-speicific-solr-posting-to-a-file-tp4140730.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: accessing individual elements of a multivalued field

2014-06-09 Thread Jack Krupansky

Not currently.

You could have separate explicit fields for the categories such as cat_1, 
cat_2, etc. The data would need to be replicated (possibly using a 
copyField), but redundancy to facilitate access is a reasonable approach.


-- Jack Krupansky

-Original Message- 
From: kritarth.anand

Sent: Monday, June 9, 2014 2:48 PM
To: solr-user@lucene.apache.org
Subject: accessing individual elements of a multivalued field

hi,

prod: p
cat : catA,catB,catC

prod :q
cat : catB, catC,catD

My schema consists of documents with uid : 'prod's and then they belong can
to multiple categories called 'cat' and which are represented as a
multivalued field. For a particular kind of query I need to access
individual elements separately as in

return prod where  (cat_1 == catA) or (cat_2==catB). is there a way by which
i can do that?

thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: accessing individual elements of a multivalued field

2014-06-09 Thread kritarth.anand
Thanks for the response Jack



--
View this message in context: 
http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862p4140911.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to simplifying my query for appropriate scoring.

2014-06-09 Thread kritarth.anand
hi all,
I need help simplifying my query. The doc structure is as follows.

docStructure
id A
cat : p, q, r

id B
cat : m, n ,o

id C
cat: l,b, o

Now given this structure my job is to find documents which have cat ids 
belonging to a list. Right now this is achieved in this fashion using OR of 
multiple queries. given list [p,n,o ] and ranked in the order.

  Query 1:  q : cat:p --- doc:A
  Query 2 : q : cat n AND !(cat:p) -- B
  Query 3 : q   (cat o AND !(cat:p) AND (cat:n)) -- C

final query = query1^3 OR query2^2 OR query3^1 

this is to ensure the ranking is A,B,C

The query is pretty complicated and gets very long too so I would want to
form a shorter version of it if possible. There are just two constraints.

 a: the highest preference is given to doc with cat:p even if some other
matches all the other terms. so A should be higher than B(even when B
matches both n and o) .

 b. Also if there are two docs which have match on first cat :p, they
should have equal score irrespective of rest of values of cat. For example
consider an additional document D  
 
id : D
cat :[p,n,o]

Now D and A  both match on first cat p and therefore the fact that D also
matches on n and o should not matter and both A and D should have same
score.

Please let me know if there is a simple way of doing it.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-simplifying-my-query-for-appropriate-scoring-tp4140913.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Integrate solr with openNLP

2014-06-09 Thread Vivekanand Ittigi
Hi Aman,

Yeah, We are also thinking the same. Using UIMA is better. And thanks to
everyone. You guys really showed us the way(UIMA).

We'll work on it.

Thanks,
Vivek


On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com wrote:

 Hi Vikek,

 As everybody in the mail list mentioned to use UIMA you should go for it,
 as opennlp issues are not tracking properly, it can make stuck your
 development in near future if any issue comes, so its better to start
 investigate with uima.


 With Regards
 Aman Tandon


 On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi vi...@biginfolabs.com
 wrote:

  Can anyone pleas reply..?
 
  Thanks,
  Vivek
 
  -- Forwarded message --
  From: Vivekanand Ittigi vi...@biginfolabs.com
  Date: Wed, Jun 4, 2014 at 4:38 PM
  Subject: Re: Integrate solr with openNLP
  To: Tommaso Teofili tommaso.teof...@gmail.com
  Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet
  Arslan iori...@yahoo.com
 
 
  Hi Tommaso,
 
  Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm
  trying to apply named recognition(person name) token but im not seeing
 any
  change. my schema.xml looks like this:
 
  field name=text type=text_opennlp_pos_ner indexed=true
 stored=true
  multiValued=true/
 
  fieldType name=text_opennlp_pos_ner class=solr.TextField
  positionIncrementGap=100
analyzer
  tokenizer class=solr.OpenNLPTokenizerFactory
tokenizerModel=opennlp/en-token.bin
  /
  filter class=solr.OpenNLPFilterFactory
nerTaggerModels=opennlp/en-ner-person.bin
  /
  filter class=solr.LowerCaseFilterFactory/
/analyzer
 
  /fieldType
 
  Please guide..?
 
  Thanks,
  Vivek
 
 
  On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili 
 tommaso.teof...@gmail.com
  
  wrote:
 
   Hi all,
  
   Ahment was suggesting to eventually use UIMA integration because
 OpenNLP
   has already an integration with Apache UIMA and so you would just have
 to
   use that [1].
   And that's one of the main reason UIMA integration was done: it's a
   framework that you can easily hook into in order to plug your NLP
  algorithm.
  
   If you want to just use OpenNLP then it's up to you if either write
 your
   own UpdateRequestProcessor plugin [2] to add metadata extracted by
  OpenNLP
   to your documents or either you can write a dedicated analyzer /
  tokenizer
   / token filter.
  
   For the OpenNLP integration (LUCENE-2899), the patch is not up to date
   with the latest APIs in trunk, however you should be able to apply it
 to
   (if I recall correctly) to 4.4 version or so, and also adapting it to
 the
   latest API shouldn't be too hard.
  
   Regards,
   Tommaso
  
   [1] :
  
 
 http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima
   [2] : http://wiki.apache.org/solr/UpdateRequestProcessor
  
  
  
   2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid:
  
   Can you extract names, locations etc using OpenNLP in plain/straight
 java
   program?
  
   If yes, here are two seperate options :
  
   1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an
   example to integrate your NER code into it and write your own indexing
   code. You have the full power here. No solr-plugins are involved.
  
   2) Use 'Implementing a conditional copyField' given here :
   http://wiki.apache.org/solr/UpdateRequestProcessor
   as an example and integrate your NER code into it.
  
  
   Please note that these are separate ways to enrich your incoming
   documents, choose either (1) or (2).
  
  
  
   On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi 
   vi...@biginfolabs.com wrote:
   Okay, but i dint understand what you said. Can you please elaborate.
  
   Thanks,
   Vivek
  
  
  
  
  
   On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com
 wrote:
  
Hi Vivekanand,
   
I have never use UIMA+Solr before.
   
Personally I think it takes more time to learn how to configure/use
   these
uima stuff.
   
   
If you are familiar with java, write a class that extends
UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new
   fields
(organisation, city, person name, etc, to your document. This phase
 is
usually called 'enrichment'.
   
Does that makes sense?
   
   
   
On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi 
   vi...@biginfolabs.com
wrote:
Hi Ahmet,
   
I followed what you said
https://cwiki.apache.org/confluence/display/solr/UIMA+Integration.
  But
   how
can i achieve my goal? i mean extracting only name of the
 organization
   or
person from the content field.
   
I guess i'm almost there but something is missing? please guide me
   
Thanks,
Vivek
   
   
   
   
   
On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi 
   vi...@biginfolabs.com
wrote:
   
 Entire goal cant be said but one of those tasks can be like