Re: Data indexing is going too slow on single shard Why?

2015-03-27 Thread Nitin Solanki
Okay. Thanks Shawn..

On Thu, Mar 26, 2015 at 12:25 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/26/2015 12:03 AM, Nitin Solanki wrote:
  Great thanks Shawn...
  As you said -  **For 204GB of data per server, I recommend at least 128GB
  of total RAM,
  preferably 256GB**. Therefore, if I have 204GB of data on single
  server/shard then I prefer is 256GB by which searching will be fast and
  never slow down. Is it?

 Obviously I cannot guarantee it, but I think it's extremely likely that
 with that much memory, performance will be very good.

 One other possibility, which is discussed on that wiki page I linked, is
 that your java heap is being almost exhausted and large amounts of time
 are spent in garbage collection.  If you increase the heap from 4GB to
 5GB and see performance get better, then that would be confirmed.  There
 would be less memory available for caching, but constant garbage
 collection would be a much greater problem than the disk cache being too
 small.

 Thanks,
 Shawn




ZFS File System for SOLR 3.6 and SOLR 4

2015-03-27 Thread abhi Abhishek
Hello,
 i am trying to use ZFS as filesystem for my Linux Environment. are
there any performance implications of using any filesystem other than
ext-3/ext-4 with SOLR?

Thanks in Advance

Best Regards,
Abhishek


SOLR Index in shared/Network folder

2015-03-27 Thread abhi Abhishek
Greetings,
  I am trying to use a network shared location as my index directory.
are there any known problems in using a Network File System for running a
SOLR Instance?

Thanks in Advance.

Best Regards,
Abhishek


Re: SOLR Index in shared/Network folder

2015-03-27 Thread Shawn Heisey
On 3/27/2015 12:06 AM, abhi Abhishek wrote:
 Greetings,
   I am trying to use a network shared location as my index directory.
 are there any known problems in using a Network File System for running a
 SOLR Instance?

It is not recommended.  You will probably need to change the lockType,
... the default native probably will not work, and you might need to
change it to none to get it working ... but that disables an important
safety mechanism that prevents index corruption.

http://stackoverflow.com/questions/9599529/solr-over-nfs-problems

Thanks,
Shawn



Database vs Solr : ID based filtering

2015-03-27 Thread Aman Tandon
Hi,

Does an ID based filtering on solr will perform poor than DB?

field nameid typestring indexed=true stored=true

   - http://localhost:8983/solr/select?q=*fq=id:153

   *OR*

   - select * from TABLE where id=153


With Regards
Aman Tandon


Re: Database vs Solr : ID based filtering

2015-03-27 Thread Aman Tandon

 so you’ll end up forever invalidating your cache.


What if we have 1 million ids assigned to the different user and each user
daily performs the query on solr. Then will it be there forever?

With Regards
Aman Tandon

On Fri, Mar 27, 2015 at 1:50 PM, Upayavira u...@odoko.co.uk wrote:

 The below won’t perform well. You’ve used a filter query, which will be
 cached, so you’ll end up forever invalidating your cache.

 Better would be http://localhost:8983/solr/select?q=id:153

 Perhaps better still would be http://localhost:8983/solr/get?id=153

 The latter is a “real time get” which will return a document that hasn’t
 even been soft-committed yet.

 As to which performs better, I’d encourage you to set up a simple
 experiment, and try it out.

 Upayavira

 On Fri, Mar 27, 2015, at 06:56 AM, Aman Tandon wrote:
  Hi,
 
  Does an ID based filtering on solr will perform poor than DB?
 
  field nameid typestring indexed=true stored=true
 
 - http://localhost:8983/solr/select?q=*fq=id:153
 
 *OR*
 
 - select * from TABLE where id=153
 
 
  With Regards
  Aman Tandon



Re: SOLR 5.0.0 and Tomcat version ?

2015-03-27 Thread Per Steffensen

On 23/03/15 20:05, Erick Erickson wrote:

you don't run a SQL engine from a servlet
container, why should you run Solr that way?

https://twitter.com/steff1193/status/580491034175660032
https://issues.apache.org/jira/browse/SOLR-7236?focusedCommentId=14383624page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14383624
etc

Not that I want to start the discussion again. The war seems to be lost.


Tweaking SOLR memory and cull facet words

2015-03-27 Thread phiroc
Hi,

my SOLR 5 solrconfig.xml file contains the following lines:

!-- Faceting defaults --
   str name=faceton/str
str name=facet.fieldtext/str
 str name=facet.mincount100/str


where the 'text' field contains thousands of words.

When I start SOLR, the search engine takes several minutes to index the words 
in the 'text' field (although loading the browse template later only takes a 
few seconds because the 'text' field has already been indexed).

Here are my questions:

- should I increase SOLR's JVM memory to make initial indexing faster?

e.g., SOLR_JAVA_MEM=-Xms1024m -Xmx204800m in solr.in.sh

- how can I cull facet words according to certain criteria (length, case, 
etc.)? For instance, my facets are the following:

application (22427)
inytapdf0 (22427)
pdf (22427)
the (22334)
new (22131)
herald (21983)
york (21975)
paris (21780)
a (21692)
and (21298)
of (21288)
i (21247)
in (21062)
to (20918)
on (20899)
m (20857)
by (20733)
de (20664)
for (20580)
at (20417)
with (20371) 
...

Obviously, words such as the, i, to,m, etc. should not be indexed. 
Furthermore, I don't care about nouns. I am only interested in people and 
location names.


Many thanks.

Philippe







Re: Database vs Solr : ID based filtering

2015-03-27 Thread Mikhail Khludnev
for the single where clause RDBMS with index performs comparable same as
inverted index. Inverted index wins on multiple 'where' clauses, where it
doesn't need composite indices; multivalue field is also its' intrinsic
advantage. More details at
http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal


On Fri, Mar 27, 2015 at 9:56 AM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi,

 Does an ID based filtering on solr will perform poor than DB?

 field nameid typestring indexed=true stored=true

- http://localhost:8983/solr/select?q=*fq=id:153

*OR*

- select * from TABLE where id=153


 With Regards
 Aman Tandon




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Database vs Solr : ID based filtering

2015-03-27 Thread Upayavira
The below won’t perform well. You’ve used a filter query, which will be
cached, so you’ll end up forever invalidating your cache.

Better would be http://localhost:8983/solr/select?q=id:153

Perhaps better still would be http://localhost:8983/solr/get?id=153

The latter is a “real time get” which will return a document that hasn’t
even been soft-committed yet.

As to which performs better, I’d encourage you to set up a simple
experiment, and try it out.

Upayavira

On Fri, Mar 27, 2015, at 06:56 AM, Aman Tandon wrote:
 Hi,
 
 Does an ID based filtering on solr will perform poor than DB?
 
 field nameid typestring indexed=true stored=true
 
- http://localhost:8983/solr/select?q=*fq=id:153
 
*OR*
 
- select * from TABLE where id=153
 
 
 With Regards
 Aman Tandon


Re: Solr replicas going in recovering state during heavy indexing

2015-03-27 Thread Per Steffensen
I think it is very likely that it is due to Solr-nodes losing 
ZK-connections (after timeout). We have experienced that a lot. One 
thing you want to do, is to make sure your ZK-servers does not run on 
the same machines as your Solr-nodes - that helped us a lot.


On 24/03/15 13:57, Gopal Jee wrote:

Hi
We have a large solrcloud cluster. We have observed that during heavy
indexing, large number of replicas go to recovering or down state.
What could be the possible reason and/or fix for the issue.

Gopal





Unable to perform search query after changing uniqueKey

2015-03-27 Thread Zheng Lin Edwin Yeo
Hi everyone,

I've changed my uniqueKey to another name, instead of using id, on the
schema.xml.

However, after I have done the indexing (the indexing is successful), I'm
not able to perform a search query on it. I gives the error
java.lang.NullPointerException.

Is there other place which I need to configure, besides changing the
uniqueKey field in scheam.xml?

Regards,
Edwin


Re: Unable to perform search query after changing uniqueKey

2015-03-27 Thread Andrea Gazzarini

Hi Edwin,
please provide some other detail about your context, (e.g. complete 
stacktrace, query you're issuing)


Best,
Andrea

On 03/27/2015 09:38 AM, Zheng Lin Edwin Yeo wrote:

Hi everyone,

I've changed my uniqueKey to another name, instead of using id, on the
schema.xml.

However, after I have done the indexing (the indexing is successful), I'm
not able to perform a search query on it. I gives the error
java.lang.NullPointerException.

Is there other place which I need to configure, besides changing the
uniqueKey field in scheam.xml?

Regards,
Edwin





Re: Solr advanced StopFilterFactory

2015-03-27 Thread Erik Hatcher
Alex - that’s definitely possible, with performance being the main 
consideration here.

But since this is for query time stop words, maybe instead your fronting 
application could take the users list and remove those words from the query 
before sending it to Solr? 

I’m curious what the ultimate goal / use case is for this feature, which may 
help us better guide you on ways to do what you need.


—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com http://www.lucidworks.com/




 On Mar 27, 2015, at 8:32 AM, Alex Sylka sylkaa...@gmail.com wrote:
 
 We need advanced stop words filter in Solr.
 
 We need stopwords to be stored in db and ability to change them by users
 (each user should have own stopwords). That's why I am thinking about
 sending stop words to solr from our app or connect to our db from solr and
 use updated stop words in custom StopFilterFactory.
 
 Also each user will have own stopwords list which will be stored in mysql
 db stopwords table. (id, user_id, stopword).
 
 We have next index structure. This index will store data for all users.
 
  field name=user_id type=int indexed=true stored=true
 required=true multiValued=false /
  field name=tag_name type=text_general indexed=true
 stored=true required=false multiValued=false/
  ...
  field name=tag_description type=text_general indexed=true
 stored=true required=false multiValued=false/
 
 I am not sure how to achive behaviour described above but I am thinking
 about writing own custom StopFilterFactory which will grab stopwords from
 db and use different stopwords for users while indexing their documents.
 
 What you can suggest ? Is that possible ? Am I on right way ?



Re: Installing the auto-phrase-tokenfilter

2015-03-27 Thread Andrea Gazzarini

Hi,
I never used that but I think you should

- get the source code / clone the repository
- run the ant build (I see a dist target)
- put the artifact in your core / shared lib dir so Solr can see that 
library

- have a look at the README [1] for how to use that

Best,
Andrea

[1] 
https://github.com/LucidWorks/auto-phrase-tokenfilter/blob/master/README.md


On 03/27/2015 01:02 PM, afrooz wrote:

I am also, can anyone help us?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Installing-the-auto-phrase-tokenfilter-tp4195466p4195787.html
Sent from the Solr - User mailing list archive at Nabble.com.




Solr advanced StopFilterFactory

2015-03-27 Thread Alex Sylka
We need advanced stop words filter in Solr.

We need stopwords to be stored in db and ability to change them by users
(each user should have own stopwords). That's why I am thinking about
sending stop words to solr from our app or connect to our db from solr and
use updated stop words in custom StopFilterFactory.

Also each user will have own stopwords list which will be stored in mysql
db stopwords table. (id, user_id, stopword).

We have next index structure. This index will store data for all users.

  field name=user_id type=int indexed=true stored=true
required=true multiValued=false /
  field name=tag_name type=text_general indexed=true
stored=true required=false multiValued=false/
  ...
  field name=tag_description type=text_general indexed=true
stored=true required=false multiValued=false/

I am not sure how to achive behaviour described above but I am thinking
about writing own custom StopFilterFactory which will grab stopwords from
db and use different stopwords for users while indexing their documents.

What you can suggest ? Is that possible ? Am I on right way ?


Re: Installing the auto-phrase-tokenfilter

2015-03-27 Thread afrooz
I am also, can anyone help us?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Installing-the-auto-phrase-tokenfilter-tp4195466p4195787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tweaking SOLR memory and cull facet words

2015-03-27 Thread Shawn Heisey
On 3/27/2015 4:14 AM, phi...@free.fr wrote:
 Hi,
 
 my SOLR 5 solrconfig.xml file contains the following lines:
 
 !-- Faceting defaults --
str name=faceton/str
   str name=facet.fieldtext/str
str name=facet.mincount100/str
 
 
 where the 'text' field contains thousands of words.
 
 When I start SOLR, the search engine takes several minutes to index the words 
 in the 'text' field (although loading the browse template later only takes a 
 few seconds because the 'text' field has already been indexed).
 
 Here are my questions:
 
 - should I increase SOLR's JVM memory to make initial indexing faster?
 
 e.g., SOLR_JAVA_MEM=-Xms1024m -Xmx204800m in solr.in.sh
 
 - how can I cull facet words according to certain criteria (length, case, 
 etc.)? For instance, my facets are the following:
 
 application (22427)
 inytapdf0 (22427)
 pdf (22427)
 the (22334)
 new (22131)
 herald (21983)
 york (21975)
 paris (21780)
 a (21692)
 and (21298)
 of (21288)
 i (21247)
 in (21062)
 to (20918)
 on (20899)
 m (20857)
 by (20733)
 de (20664)
 for (20580)
 at (20417)
 with (20371) 
 ...
 
 Obviously, words such as the, i, to,m, etc. should not be indexed. 
 Furthermore, I don't care about nouns. I am only interested in people and 
 location names.

Starting Solr does not index anything, unless you are talking about one
of the sidecar indexes for spelling correction or suggestions.  You must
send indexing requests to Solr, and if you are experiencing slow
indexing, chances are that it's because of slowness in obtaining data
from the source, not Solr ... or that you are indexing with a single
thread.  If you can set up multiple threads or processes that are
indexing in parallel, it should go faster.

Thousands of terms are not hard for Solr to handle at all.  When the
number of terms gets into the millions or billions, then it starts
becoming a hard problem.

If you use the stopword filter on the index analysis chain for the field
that you are using for facets, then all the stopwords will be removed
from the facets.  That would change how searches work on the field, so
you will probably want to use copyField to create a new field that you
use for faceting.  There are other filters that can do things you have
mentioned, like LengthFilterFactory:

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory

As far as java heap sizing, trial and error is about the only way to
find the right size.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Shawn Heisey
On 3/27/2015 7:07 AM, Russell Taylor wrote:
 Hi Shawn, thanks for the quick reply.
 
 I've looked at both methods and I think that they won't work for a number of 
 reasons:
 
 1)
 uniqueKey:
  I could use the uniqueKey and overwrite the original document but I need to 
 remove the documents which 
 are not on my new input list and the issue with the uniqueKey method is I 
 don't know what to delete.
 
 Documents on the index:
 docs: [
 {
 id:1
 keyField:A
 },{
 id:2
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 New Documents to go on index
 docs: [
 {
 id:1
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 I would never know that id:2 should be deleted. (on some new document lists 
 the delete list could be in the millions).
 
 2)
 openSearcher:
 My openSearcher is set to false and I've also commented out autoSoftCommit so 
 I don't get a partial list being returned on a query.
 !--
 autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
 /autoSoftCommit
 --
 
 
 So is there another way to keep the original set of documents until the new 
 set has been added to the index?

If you are 100% in control of when commits with openSearcher=true are
sent, which it sounds like you probably are, then you can do anything
you want from the start of indexing until commit time, and the user will
never see any of it, until the commit happens.  That allows the
following relatively simple paradigm:

1) Delete LOTS of stuff, or perhaps everything in the index with a
deleteByQuery of *:* (for all documents).

2) Index everything you need to index.

3) Commit.

Thanks,
Shawn



RE: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Russell Taylor
Hi Shawn, thanks for the quick reply.

I've looked at both methods and I think that they won't work for a number of 
reasons:

1)
uniqueKey:
 I could use the uniqueKey and overwrite the original document but I need to 
remove the documents which 
are not on my new input list and the issue with the uniqueKey method is I don't 
know what to delete.

Documents on the index:
docs: [
{
id:1
keyField:A
},{
id:2
keyField:A
},{
id:3
keyField:B
}
]
New Documents to go on index
docs: [
{
id:1
keyField:A
},{
id:3
keyField:B
}
]
I would never know that id:2 should be deleted. (on some new document lists the 
delete list could be in the millions).

2)
openSearcher:
My openSearcher is set to false and I've also commented out autoSoftCommit so I 
don't get a partial list being returned on a query.
!--
autoSoftCommit
   maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
/autoSoftCommit
--


So is there another way to keep the original set of documents until the new set 
has been added to the index?


Thanks


Russ.




-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 26 March 2015 16:06
To: solr-user@lucene.apache.org
Subject: Re: Replacing a group of documents (Delete/Insert) without a query on 
the index ever showing an empty list (Docs)

On 3/26/2015 9:53 AM, Russell Taylor wrote:
 I have an index which is made up of groups of documents, each group is 
 defined by a field called keyField (keyField:A).
 I need to delete all the keyField:A documents and replace them with a 
 brand new set without the index ever returning zero documents on a query.

 At the moment I deleteByQuery:keyField:A and then insert a 
 SolrInputDocument list via SolrJ into my index. I have a small time 
 period where somebody doing a q=fieldKey:A can be returned an empty list.

 FYI: The keyField group might be just 100 documents or up to 10 million.

As long as you don't have any commits with openSearcher=true happening between 
the delete and the insert, that would work ... but why go through the manual 
delete if you don't have to?

If you define a suitable uniqueKey field in your schema, simply indexing a new 
document with the same value in the uniqueKeyfield as an existing document will 
delete the old document.

https://wiki.apache.org/solr/UniqueKey

Thanks,
Shawn



***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***


RE: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Russell Taylor
Yes that works and now I have a better understanding of the soft and hard 
commits to boot.

Thanks again Shawn.


Russ.

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: 27 March 2015 13:22
To: solr-user@lucene.apache.org
Subject: Re: Replacing a group of documents (Delete/Insert) without a query on 
the index ever showing an empty list (Docs)

On 3/27/2015 7:07 AM, Russell Taylor wrote:
 Hi Shawn, thanks for the quick reply.
 
 I've looked at both methods and I think that they won't work for a number of 
 reasons:
 
 1)
 uniqueKey:
  I could use the uniqueKey and overwrite the original document but I 
 need to remove the documents which are not on my new input list and the issue 
 with the uniqueKey method is I don't know what to delete.
 
 Documents on the index:
 docs: [
 {
 id:1
 keyField:A
 },{
 id:2
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 New Documents to go on index
 docs: [
 {
 id:1
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 I would never know that id:2 should be deleted. (on some new document lists 
 the delete list could be in the millions).
 
 2)
 openSearcher:
 My openSearcher is set to false and I've also commented out autoSoftCommit so 
 I don't get a partial list being returned on a query.
 !--
 autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
 /autoSoftCommit
 --
 
 
 So is there another way to keep the original set of documents until the new 
 set has been added to the index?

If you are 100% in control of when commits with openSearcher=true are sent, 
which it sounds like you probably are, then you can do anything you want from 
the start of indexing until commit time, and the user will never see any of it, 
until the commit happens.  That allows the following relatively simple paradigm:

1) Delete LOTS of stuff, or perhaps everything in the index with a 
deleteByQuery of *:* (for all documents).

2) Index everything you need to index.

3) Commit.

Thanks,
Shawn



***
This message (including any files transmitted with it) may contain confidential 
and/or proprietary information, is the property of Interactive Data Corporation 
and/or its subsidiaries, and is directed only to the addressee(s). If you are 
not the designated recipient or have reason to believe you received this 
message in error, please delete this message from your system and notify the 
sender immediately. An unintended recipient's disclosure, copying, 
distribution, or use of this message or any attachments is prohibited and may 
be unlawful. 
***


Re: Tweaking SOLR memory and cull facet words

2015-03-27 Thread Shawn Heisey
On 3/27/2015 8:10 AM, phi...@free.fr wrote:
 You must send indexing requests to Solr,
 
 Are you referring to posting add/add queries to SOLR, or to something 
 else?
 
 If you can set up multiple threads or processes...
 
 How do you do that?

Yes, I am referring to posting requests to the /update handler.

Since you would be writing the program, making it multithreaded or
multi-process is up to you and the features of the language you are
writing in.

 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory
 
 Can you update the stopwords.txt file, and then re-index the documents?
 
 How?

http://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: Installing the auto-phrase-tokenfilter

2015-03-27 Thread afrooz
Thanks,
my main issue is that, I am a .net developer , but i need to use this class
within solr and call it somehow in .net. The issue is that i want the jar
file from this source code, as my searches I think i have to install Ant and
run it within eclipse... 
I tried this with creating a jar file through the java command but it seems
those jar files are not working fine while i am using them within Solr. I
have a question, if there is 3 class within the source file, i need to have
a jar file for each class or i should generate a jar all in one? and if
within solr it called  for
class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory then what
should be the name of my jar file? the name which is written in the build is
Auto-Phrase-TokenFilter 
I am confuse, please explain it for me
Thank you in advance 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Installing-the-auto-phrase-tokenfilter-tp4195466p4195811.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tweaking SOLR memory and cull facet words

2015-03-27 Thread phiroc
Hi Shawn,

 You must send indexing requests to Solr,

Are you referring to posting add/add queries to SOLR, or to something 
else?

 If you can set up multiple threads or processes...

How do you do that?

 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory

Can you update the stopwords.txt file, and then re-index the documents?

How?

Many thanks.

Philippe






- Mail original -
De: Shawn Heisey apa...@elyograg.org
À: solr-user@lucene.apache.org
Envoyé: Vendredi 27 Mars 2015 14:38:20
Objet: Re: Tweaking SOLR memory and cull facet words

On 3/27/2015 4:14 AM, phi...@free.fr wrote:
 Hi,
 
 my SOLR 5 solrconfig.xml file contains the following lines:
 
 !-- Faceting defaults --
str name=faceton/str
   str name=facet.fieldtext/str
str name=facet.mincount100/str
 
 
 where the 'text' field contains thousands of words.
 
 When I start SOLR, the search engine takes several minutes to index the words 
 in the 'text' field (although loading the browse template later only takes a 
 few seconds because the 'text' field has already been indexed).
 
 Here are my questions:
 
 - should I increase SOLR's JVM memory to make initial indexing faster?
 
 e.g., SOLR_JAVA_MEM=-Xms1024m -Xmx204800m in solr.in.sh
 
 - how can I cull facet words according to certain criteria (length, case, 
 etc.)? For instance, my facets are the following:
 
 application (22427)
 inytapdf0 (22427)
 pdf (22427)
 the (22334)
 new (22131)
 herald (21983)
 york (21975)
 paris (21780)
 a (21692)
 and (21298)
 of (21288)
 i (21247)
 in (21062)
 to (20918)
 on (20899)
 m (20857)
 by (20733)
 de (20664)
 for (20580)
 at (20417)
 with (20371) 
 ...
 
 Obviously, words such as the, i, to,m, etc. should not be indexed. 
 Furthermore, I don't care about nouns. I am only interested in people and 
 location names.

Starting Solr does not index anything, unless you are talking about one
of the sidecar indexes for spelling correction or suggestions.  You must
send indexing requests to Solr, and if you are experiencing slow
indexing, chances are that it's because of slowness in obtaining data
from the source, not Solr ... or that you are indexing with a single
thread.  If you can set up multiple threads or processes that are
indexing in parallel, it should go faster.

Thousands of terms are not hard for Solr to handle at all.  When the
number of terms gets into the millions or billions, then it starts
becoming a hard problem.

If you use the stopword filter on the index analysis chain for the field
that you are using for facets, then all the stopwords will be removed
from the facets.  That would change how searches work on the field, so
you will probably want to use copyField to create a new field that you
use for faceting.  There are other filters that can do things you have
mentioned, like LengthFilterFactory:

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory

As far as java heap sizing, trial and error is about the only way to
find the right size.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



Re: ZFS File System for SOLR 3.6 and SOLR 4

2015-03-27 Thread Shawn Heisey
On 3/27/2015 12:30 AM, abhi Abhishek wrote:
  i am trying to use ZFS as filesystem for my Linux Environment. are
 there any performance implications of using any filesystem other than
 ext-3/ext-4 with SOLR?

That should work with no problem.

The only time Solr tends to have problems is if you try to use a network
filesystem.  As long as it's a local filesystem and it implements
everything a program can typically expect from a local filesystem, Solr
should work perfectly.

Because of the compatibility problems that the license for ZFS has with
the GPL, ZFS on Linux is probably not as well tested as other
filesystems like ext4, xfs, or btrfs, but I have not heard about any big
problems, so it's probably safe.

Thanks,
Shawn



Re: Installing the auto-phrase-tokenfilter

2015-03-27 Thread Shawn Heisey
On 3/27/2015 7:45 AM, afrooz wrote:
 my main issue is that, I am a .net developer , but i need to use this class
 within solr and call it somehow in .net. The issue is that i want the jar
 file from this source code, as my searches I think i have to install Ant and
 run it within eclipse... 
 I tried this with creating a jar file through the java command but it seems
 those jar files are not working fine while i am using them within Solr. I
 have a question, if there is 3 class within the source file, i need to have
 a jar file for each class or i should generate a jar all in one? and if
 within solr it called  for
 class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory then what
 should be the name of my jar file? the name which is written in the build is
 Auto-Phrase-TokenFilter 
 I am confuse, please explain it for me

This code is from LucidWorks, not the Solr project.  You'll need to talk
to them for help on it.  One avenue is their issue tracker, but if they
run their project like we run ours, they probably prefer that you ask on
a mailing list or some other kind of support forum before you file an
issue.  I do not know where those resources might be.  There are a
number of LucidWorks employees on this mailing list, perhaps one of them
might be able to direct you.

https://github.com/LucidWorks/auto-phrase-tokenfilter/issues

Thanks,
Shawn



Re: SOLR Index in shared/Network folder

2015-03-27 Thread Walter Underwood
Several years ago, I accidentally put Solr indexes on an NFS volume and it was 
100X slower.

If you have enough RAM, query speed should be OK, but startup time (loading 
indexes into file buffers) could be really long. Indexing could be quite slow.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Mar 26, 2015, at 11:31 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/27/2015 12:06 AM, abhi Abhishek wrote:
 Greetings,
  I am trying to use a network shared location as my index directory.
 are there any known problems in using a Network File System for running a
 SOLR Instance?
 
 It is not recommended.  You will probably need to change the lockType,
 ... the default native probably will not work, and you might need to
 change it to none to get it working ... but that disables an important
 safety mechanism that prevents index corruption.
 
 http://stackoverflow.com/questions/9599529/solr-over-nfs-problems
 
 Thanks,
 Shawn
 



Re: Solr advanced StopFilterFactory

2015-03-27 Thread sylkaalex
The main goal to allow each user use own stop words list. For example user
type th
now he will see next results in his terms search:
the
the one 
the then
then
then and

But user has stop word the and he want get next results:
then
then and
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-advanced-StopFilterFactory-tp4195797p4195855.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can SOLR custom analyzer access another field's value?

2015-03-27 Thread Jack Krupansky
You could pre-process the field values in an update processor. You can even
write a snippet in JavaScript. You could check one field and then redirect
a field to an alternate field which has a different analyzer.

What expectations do you have as to what analysis should occur at query
time?

-- Jack Krupansky

On Fri, Mar 27, 2015 at 12:22 PM, Alex Sylka sylkaa...@gmail.com wrote:

 I am trying to write a custom analyzer , whose execution is determined by
 the value of another field within the document.

 For example if the locale field in the document has 'de' as the value, then
 the analizer would use the German set of tokenizers/filters to process the
 value of a field.

 My question is : how can a custom analyzer access the value of another
 field (in this case locale field) within a document, while analyzing the
 value of a specific field?

 There is a solution where we can prepend the locale value to the field's
 value like de|fieldvalue then custom analyzer can extract the locale while
 analyzing the field value. This seems a dirty solution. Is there any better
 solution ?



Can SOLR custom analyzer access another field's value?

2015-03-27 Thread Alex Sylka
I am trying to write a custom analyzer , whose execution is determined by
the value of another field within the document.

For example if the locale field in the document has 'de' as the value, then
the analizer would use the German set of tokenizers/filters to process the
value of a field.

My question is : how can a custom analyzer access the value of another
field (in this case locale field) within a document, while analyzing the
value of a specific field?

There is a solution where we can prepend the locale value to the field's
value like de|fieldvalue then custom analyzer can extract the locale while
analyzing the field value. This seems a dirty solution. Is there any better
solution ?


Re: solr server datetime

2015-03-27 Thread Erick Erickson
Why do you want to in the first place? I ask because it's a common
trap to think the server time is something that is useful...

That said, it would require a little fiddling, but you can return the
number of milliseconds since January 1, 1970 (standard Unix epoch) by
adding ms(NOW) to your fl parameter. The general case here is that you
can add the results of any function query to the fl list.

You could use a DocTransformer, here's a place to start:
https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents

There may be more elegant ways, but that one is easy.

Best,
Erick

On Thu, Mar 26, 2015 at 8:39 PM, fjq fquint...@gmail.com wrote:
 Is it possible to retrieve the server datetime?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR Index in shared/Network folder

2015-03-27 Thread Erick Erickson
To pile on: If you're talking about pointing two Solr instances at the
_same_ index, it doesn't matter whether you are on NFS or not, you'll
have all sorts of problems. And if this is a SolrCloud installation,
it's particularly hard to get right.

Please do not do this unless you have a very good reason, and please
tell us what the reason is so we can perhaps suggest alternatives.

Best,
Erick

On Fri, Mar 27, 2015 at 8:08 AM, Walter Underwood wun...@wunderwood.org wrote:
 Several years ago, I accidentally put Solr indexes on an NFS volume and it 
 was 100X slower.

 If you have enough RAM, query speed should be OK, but startup time (loading 
 indexes into file buffers) could be really long. Indexing could be quite slow.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


 On Mar 26, 2015, at 11:31 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/27/2015 12:06 AM, abhi Abhishek wrote:
 Greetings,
  I am trying to use a network shared location as my index directory.
 are there any known problems in using a Network File System for running a
 SOLR Instance?

 It is not recommended.  You will probably need to change the lockType,
 ... the default native probably will not work, and you might need to
 change it to none to get it working ... but that disables an important
 safety mechanism that prevents index corruption.

 http://stackoverflow.com/questions/9599529/solr-over-nfs-problems

 Thanks,
 Shawn




Re: Retrieving list of words for highlighting

2015-03-27 Thread simon
There's a JIRA ( https://issues.apache.org/jira/browse/SOLR-4722 )
 describing a highlighter which returns term positions rather than
snippets, which could then be mapped to  the matching words in the indexed
document (assuming that it's stored or that you have a copy elsewhere).

-Simon

On Wed, Mar 25, 2015 at 7:30 PM, Damien Dykman damien.dyk...@gmail.com
wrote:

 In Solr 5 (or 4), is there an easy way to retrieve the list of words to
 highlight?

 Use case: allow an external application to highlight the matching words
 of a matching document, rather than using the highlighted snippets
 returned by Solr.

 Thanks,
 Damien



Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

2015-03-27 Thread Erick Erickson
You can simplify things a bit by indexing a batch number guaranteed
to be different between two runs for the same keyField. In fact I'd
make sure it was unique amongst all my runs. Simplest is a timestamp
(assuming you don't start two batches within a millisecond!). So it
looks like this.

get a new timestamp
Add it to _every_ doc in my current run.
issue delete-by-query like 'q=keyfield:A AND timestamp:[* TO timestamp}
commit

As Shawn says, you have to very carefully control the commits. And
also note that the curly brace at the end is NOT a typo, it excludes
the endpoint.

Best,
Erick

On Fri, Mar 27, 2015 at 7:01 AM, Russell Taylor
russell.tay...@interactivedata.com wrote:
 Yes that works and now I have a better understanding of the soft and hard 
 commits to boot.

 Thanks again Shawn.


 Russ.

 -Original Message-
 From: Shawn Heisey [mailto:apa...@elyograg.org]
 Sent: 27 March 2015 13:22
 To: solr-user@lucene.apache.org
 Subject: Re: Replacing a group of documents (Delete/Insert) without a query 
 on the index ever showing an empty list (Docs)

 On 3/27/2015 7:07 AM, Russell Taylor wrote:
 Hi Shawn, thanks for the quick reply.

 I've looked at both methods and I think that they won't work for a number of 
 reasons:

 1)
 uniqueKey:
  I could use the uniqueKey and overwrite the original document but I
 need to remove the documents which are not on my new input list and the 
 issue with the uniqueKey method is I don't know what to delete.

 Documents on the index:
 docs: [
 {
 id:1
 keyField:A
 },{
 id:2
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 New Documents to go on index
 docs: [
 {
 id:1
 keyField:A
 },{
 id:3
 keyField:B
 }
 ]
 I would never know that id:2 should be deleted. (on some new document lists 
 the delete list could be in the millions).

 2)
 openSearcher:
 My openSearcher is set to false and I've also commented out autoSoftCommit 
 so I don't get a partial list being returned on a query.
 !--
 autoSoftCommit
maxTime${solr.autoSoftCommit.maxTime:1000}/maxTime
 /autoSoftCommit
 --


 So is there another way to keep the original set of documents until the new 
 set has been added to the index?

 If you are 100% in control of when commits with openSearcher=true are sent, 
 which it sounds like you probably are, then you can do anything you want from 
 the start of indexing until commit time, and the user will never see any of 
 it, until the commit happens.  That allows the following relatively simple 
 paradigm:

 1) Delete LOTS of stuff, or perhaps everything in the index with a 
 deleteByQuery of *:* (for all documents).

 2) Index everything you need to index.

 3) Commit.

 Thanks,
 Shawn



 ***
 This message (including any files transmitted with it) may contain 
 confidential and/or proprietary information, is the property of Interactive 
 Data Corporation and/or its subsidiaries, and is directed only to the 
 addressee(s). If you are not the designated recipient or have reason to 
 believe you received this message in error, please delete this message from 
 your system and notify the sender immediately. An unintended recipient's 
 disclosure, copying, distribution, or use of this message or any attachments 
 is prohibited and may be unlawful.
 ***


Re: Unable to perform search query after changing uniqueKey

2015-03-27 Thread Erick Erickson
You say you re-indexed, did you _completely_ remove the data directory
first, i.e. the parent of the index and, maybe, tlog directories?
I've occasionally seen remnants of old definitions pollute the new
one, and since the uniqueKey key is so fundamental I can see it
being a problem.

Best,
Erick

On Fri, Mar 27, 2015 at 1:42 AM, Andrea Gazzarini a.gazzar...@gmail.com wrote:
 Hi Edwin,
 please provide some other detail about your context, (e.g. complete
 stacktrace, query you're issuing)

 Best,
 Andrea


 On 03/27/2015 09:38 AM, Zheng Lin Edwin Yeo wrote:

 Hi everyone,

 I've changed my uniqueKey to another name, instead of using id, on the
 schema.xml.

 However, after I have done the indexing (the indexing is successful), I'm
 not able to perform a search query on it. I gives the error
 java.lang.NullPointerException.

 Is there other place which I need to configure, besides changing the
 uniqueKey field in scheam.xml?

 Regards,
 Edwin




SOLR terms component and finding least frequent terms

2015-03-27 Thread Morris, Paul E.
Dear SOLR users,

I have been using the /terms component to find low occurrence terms in a large 
SOLR index, and this works very well, but it is not possible to filter (fq) the 
results so you are stuck analyzing the whole index.

Other options might be to use SOLR faceting, but  I don't see how to easily 
produce least common facets. Does anyone have experience finding infrequent 
terms through the TermsComponent or via faceting?

Sorry if this is an odd request, but being able to perform this sort of 
analysis would be very useful.

Paul




Re: solr server datetime

2015-03-27 Thread fjq
Erick,

Thank you very much, the ms(NOW) was all I needed.

Best,

Fabricio
Em sex, 27 de mar de 2015 às 15:26, Erick Erickson [via Lucene] 
ml-node+s472066n4195883...@n3.nabble.com escreveu:

 Why do you want to in the first place? I ask because it's a common
 trap to think the server time is something that is useful...

 That said, it would require a little fiddling, but you can return the
 number of milliseconds since January 1, 1970 (standard Unix epoch) by
 adding ms(NOW) to your fl parameter. The general case here is that you
 can add the results of any function query to the fl list.

 You could use a DocTransformer, here's a place to start:

 https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents

 There may be more elegant ways, but that one is easy.

 Best,
 Erick

 On Thu, Mar 26, 2015 at 8:39 PM, fjq [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=4195883i=0 wrote:
  Is it possible to retrieve the server datetime?
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728p4195883.html
  To unsubscribe from solr server datetime, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4195728code=ZnF1aW50ZWxhQGdtYWlsLmNvbXw0MTk1NzI4fDk0MzAwNzkw
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728p4195923.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr 5.0.0 and HDFS

2015-03-27 Thread Joseph Obernberger
I just started up a two shard cluster on two machines using HDFS. When I 
started to index documents, the log shows errors like this. They repeat 
when I execute searches.  All seems well - searches and indexing appear 
to be working.

Possibly a configuration issue?
My HDFS config:
directoryFactory name=DirectoryFactory
class=solr.HdfsDirectoryFactory
bool name=solr.hdfs.blockcache.enabledtrue/bool
int name=solr.hdfs.blockcache.slab.count160/int
bool 
name=solr.hdfs.blockcache.direct.memory.allocationtrue/bool

int name=solr.hdfs.blockcache.blocksperbank16384/int
bool name=solr.hdfs.blockcache.read.enabledtrue/bool
bool name=solr.hdfs.blockcache.write.enabledfalse/bool
bool name=solr.hdfs.nrtcachingdirectory.enabletrue/bool
int name=solr.hdfs.nrtcachingdirectory.maxmergesizemb64/int
int name=solr.hdfs.nrtcachingdirectory.maxcachedmb512/int
str name=solr.hdfs.homehdfs://nameservice1:8020/solr5/str
str name=solr.hdfs.confdir/etc/hadoop/conf.cloudera.hdfs1/str
/directoryFactory
Thank you!

-Joe


java.lang.IllegalStateException: file: 
BlockDirectory(HdfsDirectory@799d5a0e 
lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory@49838b82) appears 
both in delegate and in cache: cache=[_25.fnm, _2d.si, _2e.nvd, _2b.si, 
_28.tvx, _2c.tvx, _1t.si, _27.nvd, _2b.tvd, _2d_Lucene50_0.pos, _23.nvd, 
_28_Lucene50_0.doc, _28_Lucene50_0.dvd, _2d.fdt, _2c_Lucene50_0.pos, 
_23.fdx, _2b_Lucene50_0.doc, _2d.nvm, _28.nvd, _23.fnm, 
_2b_Lucene50_0.tim, _2e.fdt, _2d_Lucene50_0.doc, _2b_Lucene50_0.dvd, 
_2d_Lucene50_0.dvd, _2b.nvd, _2g.tvx, _28_Lucene50_0.dvm, 
_1v_Lucene50_0.tip, _2e_Lucene50_0.dvm, _2e_Lucene50_0.pos, _2g.fdx, 
_2e.nvm, _2f.fdx, _1s.tvd, _23.nvm, _27.nvm, _1s_Lucene50_0.tip, 
_2c.fnm, _2b.fdt, _2d.fdx, _2c.fdx, _2c.nvm, _2e.fnm, 
_2d_Lucene50_0.dvm, _28.nvm, _28.fnm, _2b_Lucene50_0.tip, 
_2e_Lucene50_0.dvd, _2c.si, _2f.fdt, _2b.fnm, _2e_Lucene50_0.tip, 
_28.si, _28_Lucene50_0.tip, _2f.tvd, _2d_Lucene50_0.tim, _2f.tvx, 
_2b_Lucene50_0.pos, _2e.fdx, _28.fdx, _2c_Lucene50_0.dvd, _2g.tvd, 
_2c_Lucene50_0.tim, _2b.nvm, _23.fdt, _1s_Lucene50_0.tim, 
_28_Lucene50_0.tim, _2c_Lucene50_0.doc, _28.tvd, _2b.tvx, _2c.nvd, 
_2b.fdx, _2c_Lucene50_0.tip, _2e_Lucene50_0.doc, _2e_Lucene50_0.tim, 
_2c.fdt, _27.tvd, _2d.tvd, _2d.tvx, _28_Lucene50_0.pos, 
_2b_Lucene50_0.dvm, _2e.si, _2e.tvd, _2d.fnm, _2c.tvd, _2g.fdt, _2e.tvx, 
_28.fdt, _2d_Lucene50_0.tip, _2c_Lucene50_0.dvm, 
_2d.nvd],delegate=[_10.fdt, _10.fdx, _10.fnm, _10.nvd, _10.nvm, _10.si, 
_10.tvd, _10.tvx, _10_Lucene50_0.doc, _10_Lucene50_0.dvd, 
_10_Lucene50_0.dvm, _10_Lucene50_0.pos, _10_Lucene50_0.tim, 
_10_Lucene50_0.tip, _11.fdt, _11.fdx, _11.fnm, _11.nvd, _11.nvm, _11.si, 
_11.tvd, _11.tvx, _11_Lucene50_0.doc, _11_Lucene50_0.dvd, 
_11_Lucene50_0.dvm, _11_Lucene50_0.pos, _11_Lucene50_0.tim, 
_11_Lucene50_0.tip, _12.fdt, _12.fdx, _12.fnm, _12.nvd, _12.nvm, _12.si, 
_12.tvd, _12.tvx, _12_Lucene50_0.doc, _12_Lucene50_0.dvd, 
_12_Lucene50_0.dvm, _12_Lucene50_0.pos, _12_Lucene50_0.tim, 
_12_Lucene50_0.tip, _13.fdt, _13.fdx, _13.fnm, _13.nvd, _13.nvm, _13.si, 
_13.tvd, _13.tvx, _13_Lucene50_0.doc, _13_Lucene50_0.dvd, 
_13_Lucene50_0.dvm, _13_Lucene50_0.pos, _13_Lucene50_0.tim, 
_13_Lucene50_0.tip, _14.fdt, _14.fdx, _14.fnm, _14.nvd, _14.nvm, _14.si, 
_14.tvd, _14.tvx, _14_Lucene50_0.doc, _14_Lucene50_0.dvd, 
_14_Lucene50_0.dvm, _14_Lucene50_0.pos, _14_Lucene50_0.tim, 
_14_Lucene50_0.tip, _15.fdt, _15.fdx, _15.fnm, _15.nvd, _15.nvm, _15.si, 
_15.tvd, _15.tvx, _15_Lucene50_0.doc, _15_Lucene50_0.dvd, 
_15_Lucene50_0.dvm, _15_Lucene50_0.pos, _15_Lucene50_0.tim, 
_15_Lucene50_0.tip, _1f.fdt, _1f.fdx, _1f.fnm, _1f.nvd, _1f.nvm, _1f.si, 
_1f.tvd, _1f.tvx, _1f_Lucene50_0.doc, _1f_Lucene50_0.dvd, 
_1f_Lucene50_0.dvm, _1f_Lucene50_0.pos, _1f_Lucene50_0.tim, 
_1f_Lucene50_0.tip, _1g.fdt, _1g.fdx, _1g.fnm, _1g.nvd, _1g.nvm, _1g.si, 
_1g.tvd, _1g.tvx, _1g_Lucene50_0.doc, _1g_Lucene50_0.dvd, 
_1g_Lucene50_0.dvm, _1g_Lucene50_0.pos, _1g_Lucene50_0.tim, 
_1g_Lucene50_0.tip, _1h.fdt, _1h.fdx, _1h.fnm, _1h.nvd, _1h.nvm, _1h.si, 
_1h.tvd, _1h.tvx, _1h_Lucene50_0.doc, _1h_Lucene50_0.dvd, 
_1h_Lucene50_0.dvm, _1h_Lucene50_0.pos, _1h_Lucene50_0.tim, 
_1h_Lucene50_0.tip, _1i.fdt, _1i.fdx, _1i.fnm, _1i.nvd, _1i.nvm, _1i.si, 
_1i.tvd, _1i.tvx, _1i_Lucene50_0.doc, _1i_Lucene50_0.dvd, 
_1i_Lucene50_0.dvm, _1i_Lucene50_0.pos, _1i_Lucene50_0.tim, 
_1i_Lucene50_0.tip, _1j.fdt, _1j.fdx, _1j.fnm, _1j.nvd, _1j.nvm, _1j.si, 
_1j.tvd, _1j.tvx, _1j_Lucene50_0.doc, _1j_Lucene50_0.dvd, 
_1j_Lucene50_0.dvm, _1j_Lucene50_0.pos, _1j_Lucene50_0.tim, 
_1j_Lucene50_0.tip, _1k.fdt, _1k.fdx, _1k.fnm, _1k.nvd, _1k.nvm, _1k.si, 
_1k.tvd, _1k.tvx, _1k_Lucene50_0.doc, _1k_Lucene50_0.dvd, 
_1k_Lucene50_0.dvm, _1k_Lucene50_0.pos, _1k_Lucene50_0.tim, 
_1k_Lucene50_0.tip, _1l.fdt, _1l.fdx, _1l.fnm, _1l.nvd, _1l.nvm, _1l.si, 
_1l.tvd, _1l.tvx, _1l_Lucene50_0.doc, 

New To Solr, getting error using the quick start guide

2015-03-27 Thread Will ferrer
Hi

I am new to solr and trying to run through the quick start guide (
http://lucene.apache.org/solr/quickstart.html).

The installation seems fine but then I run:

bin/solr start -e cloud -noprompt

I get:

Welcome to the SolrCloud example!


Starting up 2 Solr nodes for your example SolrCloud cluster.

Starting up SolrCloud node1 on port 8983 using command:

solr start -cloud -s example/cloud/node1/solr -p 8983


Waiting to see Solr listening on port 8983 [|]
Started Solr server on port 8983 (pid=15536). Happy searching!



Starting node2 on port 7574 using command:

solr start -cloud -s example/cloud/node2/solr -p 7574 -z localhost:9983


Waiting to see Solr listening on port 7574 [/]
Started Solr server on port 7574 (pid=15798). Happy searching!


Then I run in another console, because this one is still occupied with solr:

bin/post -c gettingstarted docs/


I get:

java -classpath /usr/lib/solr-5.0.0/dist/solr-core-5.0.0.jar -Dauto=yes
-Dc=gettingstarted -Ddata=files -Drecursive=yes
org.apache.solr.util.SimplePostTool docs/
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are
xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory docs (3 files, depth=0)
POSTing file quickstart.html (text/html) to [base]/extract
SimplePostTool: FATAL: Connection error (is Solr running at
http://localhost:8983/solr/gettingstarted/update ?):
java.net.ConnectException: Connection timed out


Mean while, in the console that I used to start solr I get:

WARN  - 2015-03-27 18:41:15.077; org.apache.solr.util.SolrCLI; Request to
http://localhost:8983/solr/admin/info/system failed due to: Connection
refused, sleeping for 5 seconds before re-trying the request ...
Exception in thread main java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:117)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:178)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:610)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:445)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:214)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:160)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:136)
at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:512)
at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:456)
at org.apache.solr.util.SolrCLI.getJson(SolrCLI.java:466)
at org.apache.solr.util.SolrCLI.getZkHost(SolrCLI.java:1113)
at
org.apache.solr.util.SolrCLI$CreateCollectionTool.runTool(SolrCLI.java:1155)
at org.apache.solr.util.SolrCLI.main(SolrCLI.java:203)


SolrCloud example running, please visit http://localhost:8983/solr

The console then exits from the process.

I can open http://localhost:8983/solr/admin/info/system in my web browser
and it has an xml file.

http://localhost:8983/solr/gettingstarted/update in my web browser gives me:

HTTP ERROR 404

Problem accessing /solr/gettingstarted/update. Reason:

Not Found

Powered by Jetty://


http://localhost:8983/solr/#/ shows data in my web browser, but the cloud
tab is empty under graph.

Any advice any one give me to get me started here with the product would be
very appreciated.

All the best.

Will Ferrer