date:20150326

On 3/26/2015 6:40 AM, Mrinali Agarwal wrote:
 I am trying to run  my test cases in solr using ant .
 I am using below command
 
 ant test –Dtestcase=Test  -Dtests.leaveTemporary=true
 
 Now , here i have my own custom schema  solrConfig . On running the above
 command on solr directiory , it builds the project again which overrides my
 schema.xml  solrConfig.xml
 
 Due to this my test case fails because it is not able to find customized
 schema  config .
 
 Let me know any suggestions

Take a look at org.apache.solr.search.TestLFUCache for an example of a
test that loads a custom solrconfig.

The custom config is here:

solr/core/src/test-files/solr/collection1/conf/solrconfig-caching.xml

The code in TestLFUCache.java that uses that config is:

  @BeforeClass
  public static void beforeClass() throws Exception {
initCore(solrconfig-caching.xml, schema.xml);
  }

Thanks,
Shawn

Different methods of sending documents to Solr

2015-03-26 Thread zhangxin0804

Hi All,

I am trying to post data into Solr using curl command. Does anybody
could tell me the difference between the following two methods?

Method1:
curl http://localhost:8983/solr/update/extract?literal.id=doc1commit=true;
-F myfile=@tutorial.html

The -F flag instructs curl to POST data using the Content-Type
multipart/form-data and supports the uploading of binary files. 

Method2:
curl
http://localhost:8983/solr/update/extract?literal.id=doc1defaultField=textcommit=true;
 
--data-binary @tutorial.html  -H 'Content-type:text/html'


   Consider my situation:
   I want to post many different content-types of files into Solr. Which
method should I choose?

  Thank you so much.

Sincerely,
Xiaoha





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-methods-of-sending-documents-to-Solr-tp4195725.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: i'm a newb: questions about schema.xml

2015-03-26 Thread Zheng Lin Edwin Yeo

Yes, this is the correct page which will tell you more about this
managed-schema thing in Solr 5.0.0. I got stuck in this for quite a while
previously too.

Regards,
Edwin

On 27 March 2015 at 08:20, Mark Bramer mbra...@esri.com wrote:

Pretty sure I found what I am looking for:
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig

I noticed the managed-schema file and a couple Google searches with that
finally landed me at that link.

Interesting that the file is hidden from the Files list in the Admin UI.

Thanks!

-Original Message-
From: Mark Bramer
Sent: Thursday, March 26, 2015 7:42 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: i'm a newb: questions about schema.xml

Hi Shawn,

Definitely helpful to know about the instance and files stuff in Admin.
I'm not running cloud, so I looked in the /conf directory but there's no
schema.xml:

Here's what's in my core's Files:
currency.xml
elevate.xml
lang
params.json
protwords.txt
solrconfig.xml
stopwords.txt
synonyms.txt

and echoed by ls -l:

-rw-r--r-- 1 root root 3974 Feb 15 11:38 currency.xml
-rw-r--r-- 1 root root 1348 Feb 15 11:38 elevate.xml drwxr-xr-x 2 root
root 4096 Mar 23 10:46 lang
-rw-r--r-- 1 root root 29733 Mar 23 18:04 managed-schema
-rw-r--r-- 1 root root 308 Feb 15 11:38 params.json
-rw-r--r-- 1 root root 873 Feb 15 11:38 protwords.txt
-rw-r--r-- 1 root root 60591 Feb 15 11:38 solrconfig.xml
-rw-r--r-- 1 root root 781 Feb 15 11:38 stopwords.txt
-rw-r--r-- 1 root root 1119 Feb 15 11:38 synonyms.txt

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Thursday, March 26, 2015 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: i'm a newb: questions about schema.xml

On 3/26/2015 4:57 PM, Mark Bramer wrote:
I'm a Solr newb. I've been poking around for several days on my own
test instance, and also online at the info available. But one thing just
isn't jiving and I can't put my finger on why. I've searched many many
times but I don't see what I'm looking for, so I'm thinking perhaps I have
a fundamental semantic misunderstanding of something somewhere. Everywhere
I read, everyone talks about schema.xml and how important is. I fully get
what it's for but I don't get where it is, how it's used (by me), how I
edit it, and how I create new indexes once I've edited it.

I've installed, and am successfully running, solr 5.0.0 on Linux. I've
followed the widely recommended-by-all quick start at:
http://lucene.apache.org/solr/quickstart.html. I get through it fine, I
post a bunch of stuff, I use the web UI to query for, and see, data I would
expect to see. Should I now have a schema.xml file somewhere that is
somehow connected to my new index? If so, where is it? Was it present
from install or did it get created when I made my first core (bin/solr
create -c ati_docs)?

[root@machine solr-5.0.0]# find -name schema.xml
./example/example-DIH/solr/tika/conf/schema.xml
./example/example-DIH/solr/rss/conf/schema.xml
./example/example-DIH/solr/solr/conf/schema.xml
./example/example-DIH/solr/db/conf/schema.xml
./example/example-DIH/solr/mail/conf/schema.xml
./server/solr/configsets/basic_configs/conf/schema.xml
./server/solr/configsets/sample_techproducts_configs/conf/schema.xml
[root@machine solr-5.0.0]#

Is it the one in /configsets/basic_configs/conf? Is that the default
one?

If I want to 'modify' schema.xml to do some different
indexing/analyzing, how do I start? Make a copy of that schema.xml, move
it somewhere else and modify it? If so, how do I create a new index using
this schema.xml?

Or am I running in schemaless mode? I don't think I am because it
appears that I would have to specifically state this as a command line
parameter, i.e. bin/solr start -e schemaless

What fundamentals am I missing? I'm coming to Solr from Elasticsearch,
and I've already recognized some differences. Is my ES background clouding
my grasp of Solr fundamentals?

Hopefully you know what core you are using, so you can go to the admin UI
and find it in the Core Selector dropdown list. Assuming you can do
that, you will find yourself looking at the Overview tab for that core.

https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface

Once you are looking at the core overview, in the upper right corner of
your browser window is a section called Instance ... which has an entry
that is ALSO called Instance. Inside the directory indicated by that
field, you should have a conf directory. The config and schema for that
index are found in that conf directory.

If you're running SolrCloud, then you can forget everything I just said
... the active configs will be found within the zookeeper database, and you
can use the Cloud-Tree tab in the admin UI to find your collections and
see which configName is linked to each

solr server datetime

2015-03-26 Thread fjq

Is it possible to retrieve the server datetime?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-server-datetime-tp4195728.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to create a core by API?

On Thu, Mar 26, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Hmmm, looks like I stand corrected. I haven't kept complete track
 there, looks like this one didn't stick in my head.


I'm not saying you're wrong. The configSet parameter doesn't work at all in
my set up, so you might be right... I'm just wondering where that's
documented.

I thought Solr documentation was rough back in the 1.6 days, but wow...
it's gotten shockingly bad in Solr 5.


 As far as the docs are concerned, all patches welcome!


What kind of patch do you mean? Isn't all the documentation maintained on
confluence?

-- 
Mark E. Haase
202-815-0201

Re: How to create a core by API?

Okay, thanks for the feedback. I'll admit that I do find the cloud vs
non-cloud deployment options a constant source of confusion, not the least
of which is due to the name. If I run a single Solr instance on EC2, that's
not cloud, but if I run a few instances with ZK on my local LAN, that is
cloud. Mmmkay.

I can't imagine why the API documentation wouldn't mention that the API
can't actually do the thing it's supposed to do (create a core). What's the
purpose of having an HTTP API if I'm expected to already have write access
to the host's file system to use it? Maybe its intended as private API? It
should only be used by Solr itself, e.g. `solr create -c foo` uses the
Cores Admin API to do some (but not all) of its work. But if that's the
case, then the API docs should say that.

From an API consumer's point of view, I'm not really interested in being
forced to learn the history of the project to use the API. The whole point
of creating APIs is to abstract out details that the caller doesn't need to
know, and yet this API requires an understanding of Solr's internal file
structure and history of the project?

Yikes.

On Thu, Mar 26, 2015 at 12:56 PM, Erick Erickson erickerick...@gmail.com
wrote:

Ok, you're being confused by cloud, non cloud and all that kinda stuff

Configsets are SolrCloud only, so forget them since you specified it's
not SolrCloud.

bq: surely the HTTP API doesn't require the caller to create a
directory and copy files first, does it

In fact, yes. The thing to remember here is that you're using a much
older approach that had its roots in the pre-cloud days. The problem
is how do you insure that the configurations are on the node you're
creating the core on? The whole configsets discussion is an attempt to
solve that in SolrCloud by putting the configs in a place any Solr
instance can find them, namely Zookeeper.

But in non-cloud, there's no central repository. You could be firing
the query from node X and creating the core on node Y. So Solr expects
the config files to already be in place; you have to manually copy
them to node Y anyway, why not copy them to the place they'll be
needed?

The scripts make an assumption that you're running on the same node
you're running the scripts for quick-start purposes.

Best,
Erick

On Thu, Mar 26, 2015 at 9:24 AM, Mark E. Haase meha...@gmail.com wrote:
I can't get the Core Admin API to work. I have a brand new installation
of
Solr 5.0.0 (in non-cloud mode). I installed using the installation script
(a nice addition!) with default options, so I have Solr in /opt/solr and
its data in /var/solr.

Here's what I'm trying:

curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=new_core
'

But I get this error:

Error CREATEing SolrCore 'new_core': Unable to create core [new_core]
Caused by: Can't find resource 'solrconfig.xml' in classpath or
'/var/solr/data/new_core/conf'

Solr isn't even creating /var/solr/data/new_core, which I guess is the
root
of the problem. But /var/solr is owned by the solr user and I can do
`sudo
-u solr mkdir /var/solr/data/new_core` just fine. So why isn't Solr
making
this directory?

I see that 'instanceDir' is required, but I don't get an error message
if I
*don't* use it, so I'm not sure how required it actually is. I'm also not
sure if its supposed to be a full path or a relative path or what, so
here
are a couple of other guesses at the correct incantation:

curl '

http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=new_core
'
curl '

http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=/var/solr/data/new_core
'

These both return the same error message as my first try, so no dice...

FWIW, I get the same error message even if I try doing this with the Solr
Admin GUI so I'm really puzzled. Is the GUI supposed to work?

I found a thread on Stack Overflow about this same problem (
http://stackoverflow.com/a/28945428/122763) that suggests using
configSet.
Okay, the installer put some configs sets in
/opt/solr/server /opt/solr/server/solr/configsets, and the 'basic_config'
config set has a solrconfig.xml in it, so maybe that would solve my
solrconfig.xml error?

If I compare the HTTP API to the `solr create -c foo` script, it appears
that the script creates the instance directory and copies in conf files
*before
*it calls the HTTP API... surely the HTTP API doesn't require the caller
to
create a directory and copy files first, does it?

--
Mark E. Haase

--
Mark E. Haase
202-815-0201

Re: How to create a core by API?

Erick, are you sure that configSets don't apply to single-node Solr
instances?

https://cwiki.apache.org/confluence/display/solr/Config+Sets

I don't see anything about Solr cloud there. Also, configSet is a
documented argument to the Core Admin API:

https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE

And one of the few things [I thought] I knew about cloud vs non cloud
setups was the Collections API is for cloud and Cores API is for non cloud,
right? So why would the non-cloud API take a cloud-only argument?

On Thu, Mar 26, 2015 at 1:16 PM, Mark E. Haase meha...@gmail.com wrote:

Yikes.

On Thu, Mar 26, 2015 at 12:56 PM, Erick Erickson erickerick...@gmail.com
wrote:

Ok, you're being confused by cloud, non cloud and all that kinda stuff

Configsets are SolrCloud only, so forget them since you specified it's
not SolrCloud.

bq: surely the HTTP API doesn't require the caller to create a
directory and copy files first, does it

The scripts make an assumption that you're running on the same node
you're running the scripts for quick-start purposes.

Best,
Erick

On Thu, Mar 26, 2015 at 9:24 AM, Mark E. Haase meha...@gmail.com wrote:
I can't get the Core Admin API to work. I have a brand new installation
of
Solr 5.0.0 (in non-cloud mode). I installed using the installation
script
(a nice addition!) with default options, so I have Solr in /opt/solr and
its data in /var/solr.

Here's what I'm trying:

curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=new_core
'

But I get this error:

Error CREATEing SolrCore 'new_core': Unable to create core
[new_core]
Caused by: Can't find resource 'solrconfig.xml' in classpath or
'/var/solr/data/new_core/conf'

I see that 'instanceDir' is required, but I don't get an error message
if I
*don't* use it, so I'm not sure how required it actually is. I'm also
not
sure if its supposed to be a full path or a relative path or what, so
here
are a couple of other guesses at the correct incantation:

curl '

http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=new_core
'
curl '

http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=/var/solr/data/new_core
'

These both return the same error message as my first try, so no dice...

FWIW, I get the same error message even if I try doing this with the
Solr
Admin GUI so I'm really puzzled. Is the GUI supposed to work?

I found a thread on Stack Overflow about this same problem (
http://stackoverflow.com/a/28945428/122763) that suggests using
configSet.
Okay, the installer put some configs sets in
/opt/solr/server /opt/solr/server/solr/configsets, and the
'basic_config'
config set has a solrconfig.xml in it, so maybe that would solve my
solrconfig.xml error?

If I compare the HTTP API to the

Re: Solr Monitoring - Stored Stats?

2015-03-26 Thread Upayavira

Have a look at the admin UI, plugins/stats.

I’ve just spent the time to re-implement it in AngularJS, so I know the
functionality is there - twice :-)

You can “watch for changes” - it pulls in a reference XML, and posts
that back to the server, which only reports back changes.

Dunno if that gives you what you are after?

Upayavira

On Thu, Mar 26, 2015, at 03:15 PM, Matt Kuiper wrote:
 Erick, Shawn,
 
 Thanks for your responses.  I figured this was the case, just wanted to
 check to be sure.
 
 I have used Zabbix to configure JMX points to monitor over time, but it
 was a bit of work to get configured.  We are looking to create a simple
 dashboard of a few stats over time.  Looks like the easiest approach will
 be to make an app to make calls for these stats at a regular interval and
 then index results to Solr, and then we will able to query over desired
 time frames...
 
 Thanks,
 Matt
 
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com] 
 Sent: Wednesday, March 25, 2015 10:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Monitoring - Stored Stats?
 
 Matt:
 
 Not really. There's a bunch of third-party log analysis tools that give
 much of this information (not everything exposed by JMX of course is in
 the log files though).
 
 Not quite sure whether things like Nagios, Zabbix and the like have this
 kind of stuff built in seems like a natural extension of those kinds of
 tools though
 
 Not much help here...
 Erick
 
 On Wed, Mar 25, 2015 at 8:26 AM, Matt Kuiper matt.kui...@issinc.com
 wrote:
  Hello,
 
  I am familiar with the JMX points that Solr exposes to allow for monitoring 
  of statistics like QPS, numdocs, Average Query Time...
 
  I am wondering if there is a way to configure Solr to automatically store 
  the value of these stats over time (for a given time interval), and then 
  allow a user to query a stat over a time range.  So for the QPS stat,  the 
  query might return a set that includes the QPS value for each hour in the 
  time range specified.
 
  Thanks,
  Matt

Uneven index distribution using composite router

2015-03-26 Thread Shamik Bandopadhyay

Hi,

   I'm using a three level composite router in a solr cloud environment,
primarily for multi-tenant and field collapsing. The format is as follows.

*language!topic!url*.

An example would be :

ENU!12345!www.testurl.com/enu/doc1
GER!12345!www.testurl.com/ger/doc2
CHS!67890!www.testurl.com/chs/doc3

The Solr Cloud cluster contains 2 shard, each having 3 replicas. After
indexing around 10 million documents, I'm observing that the index size in
shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is
getting indexed in shard 1. Since 60% of the document is english, I expect
the index size to be higher on one shard, but the difference seem little
too high.

The idea is to make sure that all ENU!12345 documents are routed to one
shard so that distributed field collapsing works. Is there something I can
do differently here to make a better distribution ?

Any pointers will be appreciated.

Regards,
Shamik

How to create a core by API?

I can't get the Core Admin API to work. I have a brand new installation of
Solr 5.0.0 (in non-cloud mode). I installed using the installation script
(a nice addition!) with default options, so I have Solr in /opt/solr and
its data in /var/solr.

Here's what I'm trying:

curl 'http://localhost:8983/solr/admin/cores?action=CREATEname=new_core
'

But I get this error:

Error CREATEing SolrCore 'new_core': Unable to create core [new_core]
Caused by: Can't find resource 'solrconfig.xml' in classpath or
'/var/solr/data/new_core/conf'

Solr isn't even creating /var/solr/data/new_core, which I guess is the root
of the problem. But /var/solr is owned by the solr user and I can do `sudo
-u solr mkdir /var/solr/data/new_core` just fine. So why isn't Solr making
this directory?

I see that 'instanceDir' is required, but I don't get an error message if I
*don't* use it, so I'm not sure how required it actually is. I'm also not
sure if its supposed to be a full path or a relative path or what, so here
are a couple of other guesses at the correct incantation:

curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=new_core
'
curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=/var/solr/data/new_core
'

These both return the same error message as my first try, so no dice...

FWIW, I get the same error message even if I try doing this with the Solr
Admin GUI so I'm really puzzled. Is the GUI supposed to work?

I found a thread on Stack Overflow about this same problem (
http://stackoverflow.com/a/28945428/122763) that suggests using configSet.
Okay, the installer put some configs sets in
/opt/solr/server /opt/solr/server/solr/configsets, and the 'basic_config'
config set has a solrconfig.xml in it, so maybe that would solve my
solrconfig.xml error?

If I compare the HTTP API to the `solr create -c foo` script, it appears
that the script creates the instance directory and copies in conf files *before
*it calls the HTTP API... surely the HTTP API doesn't require the caller to
create a directory and copy files first, does it?

--
Mark E. Haase

Re: Applying Tokenizers and Filters to CopyFields

Glad it worked out...

Looking back, I can't believe I didn't mention adding debug=query to
the URL. That would have shown you exactly what the parsed query
looked like and you'd have seen right off that it wasn't searching
against the field you thought it was. It's one of the first things I
do when queries don't return what I expect.

Glad it's working for you!
Erick

On Thu, Mar 26, 2015 at 8:24 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Glad you are sorted out!

 Michael Della Bitta

 Senior Software Engineer

 o: +1 646 532 3062

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 w: appinions.com http://www.appinions.com/

 On Thu, Mar 26, 2015 at 10:09 AM, Martin Wunderlich martin...@gmx.net
 wrote:

 Thanks so much, Erick and Michael, for all the additional explanation.
 The crucial information in the end turned out to be the one about the
 Default Search Field („df“). In solrconfig.xml this parameter was to point
 to the original text, which is why the expanded queries didn’t work. When I
 set the df parameter to one of the fields with the expanded text, the
 search works fine. I have also removed the copyField declarations.

 It’s all working as expected now. Thanks again for the help.

 Cheers,

 Martin




  Am 25.03.2015 um 23:43 schrieb Erick Erickson erickerick...@gmail.com:
 
  Martin:
  Perhaps this would help
 
  indexed=true, stored=true
  field can be searched. The raw input (not analyzed in any way) can be
  shown to the user in the results list.
 
  indexed=true, stored=false
  field can be searched. However, the field can't be returned in the
  results list with the document.
 
  indexed=false, stored=true
  The field cannot be searched, but the contents can be returned in the
  results list with the document. There are some use-cases where this is
  desirable behavior.
 
  indexed=false, stored=false
  The entire field is thrown out, it's just as if you didn't send the
  field to be indexed at all.
 
  And one other thing, the copyField gets the _raw_ data not the
  analyzed data. Let's say you have two fields, src and dst.
  copying from src to dest in schema.xml is identical to
  add
   doc
 field name=srcoriginal text/field
field name=dstoriginal text/field
  /doc
  /add
 
  that is, copyfield directives are not chained.
 
  Also, watch out for your query syntax. Michael's comments are spot-on,
  I'd just add this:
 
 
 http://localhost:8983/solr/windex/select?q=Sprachefq=originalwt=jsonindent=true
 
  is kind of odd. Let's assume you mean qf rather than fq. That
  _only_ matters if your query parser is edismax, it'll be ignored in
  this case I believe.
 
  You'd want something like
  q=src:Sprache
  or
  q=dst:Sprache
  or even
  http://localhost:8983/solr/windex/select?q=Sprachedf=src
  http://localhost:8983/solr/windex/select?q=Sprachedf=dst
 
  where df is default field and the search is applied against that
  field in the absence of a field qualification like my first two
  examples.
 
  Best,
  Erick
 
  On Wed, Mar 25, 2015 at 2:52 PM, Michael Della Bitta
  michael.della.bi...@appinions.com wrote:
  I agree the terminology is possibly a little confusing.
 
  Stored refers to values that are stored verbatim. You can retrieve them
  verbatim. Analysis does not affect stored values.
  Indexed values are tokenized/transformed and stored inverted. You can't
  recover the literal analyzed version (at least, not easily).
 
  If what you really want is to store and retrieve case folded versions of
  your data as well as the original, you need to use something like a
  UpdateRequestProcessor, which I personally am less familiar with.
 
 
  On Wed, Mar 25, 2015 at 5:28 PM, Martin Wunderlich martin...@gmx.net
  wrote:
 
  So, the pre-processing steps are applied under analyzer type=„index“.
  And this point is not quite clear to me: Assuming that I have a simple
  case-folding step applied to the target of the copyField: How or where
 are
  the lower-case tokens stored, if the text isn’t added to the index?
 How is
  the query supposed to retrieve the lower-case version?
  (sorry, if this sounds like a naive question, but I have a feeling
 that I
  am missing something really basic here).
 
 
 
  Michael Della Bitta
 
  Senior Software Engineer
 
  o: +1 646 532 3062
 
  appinions inc.
 
  “The Science of Influence Marketing”
 
  18 East 41st Street
 
  New York, NY 10017
 
  t: @appinions https://twitter.com/Appinions | g+:
  plus.google.com/appinions
  
 https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
 
  w: appinions.com http://www.appinions.com/

Re: How to create a core by API?

Hmmm, looks like I stand corrected. I haven't kept complete track
there, looks like this one didn't stick in my head.

As far as the docs are concerned, all patches welcome!

Best,
Erick

On Thu, Mar 26, 2015 at 10:26 AM, Mark E. Haase meha...@gmail.com wrote:
Erick, are you sure that configSets don't apply to single-node Solr
instances?

https://cwiki.apache.org/confluence/display/solr/Config+Sets

I don't see anything about Solr cloud there. Also, configSet is a
documented argument to the Core Admin API:

https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-CREATE

On Thu, Mar 26, 2015 at 1:16 PM, Mark E. Haase meha...@gmail.com wrote:

Yikes.

On Thu, Mar 26, 2015 at 12:56 PM, Erick Erickson erickerick...@gmail.com
wrote:

Ok, you're being confused by cloud, non cloud and all that kinda stuff

Configsets are SolrCloud only, so forget them since you specified it's
not SolrCloud.

bq: surely the HTTP API doesn't require the caller to create a
directory and copy files first, does it

The scripts make an assumption that you're running on the same node
you're running the scripts for quick-start purposes.

Best,
Erick

On Thu, Mar 26, 2015 at 9:24 AM, Mark E. Haase meha...@gmail.com wrote:
I can't get the Core Admin API to work. I have a brand new installation
of
Solr 5.0.0 (in non-cloud mode). I installed using the installation
script
(a nice addition!) with default options, so I have Solr in /opt/solr and
its data in /var/solr.

Here's what I'm trying:

curl '
http://localhost:8983/solr/admin/cores?action=CREATEname=new_core
'

But I get this error:

Error CREATEing SolrCore 'new_core': Unable to create core
[new_core]
Caused by: Can't find resource 'solrconfig.xml' in classpath or
'/var/solr/data/new_core/conf'

I see that 'instanceDir' is required, but I don't get an error message
if I
*don't* use it, so I'm not sure how required it actually is. I'm also
not
sure if its supposed to be a full path or a relative path or what, so
here
are a couple of other guesses at the correct incantation:

curl '

http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=new_core
'
curl '

http://localhost:8983/solr/admin/cores?action=CREATEname=new_coreinstanceDir=/var/solr/data/new_core
'

These both return the same error message as my first try, so no dice...

FWIW, I get the same error message even if I try doing this with the
Solr
Admin GUI so I'm really puzzled. Is the GUI supposed to work?

I found a thread on Stack Overflow about this same problem (
http://stackoverflow.com/a/28945428/122763) that suggests using

Re: How to create a core by API?

Ok, you're being confused by cloud, non cloud and all that kinda stuff

Configsets are SolrCloud only, so forget them since you specified it's
not SolrCloud.

bq: surely the HTTP API doesn't require the caller to create a
directory and copy files first, does it

The scripts make an assumption that you're running on the same node
you're running the scripts for quick-start purposes.

Best,
Erick

On Thu, Mar 26, 2015 at 9:24 AM, Mark E. Haase meha...@gmail.com wrote:
I can't get the Core Admin API to work. I have a brand new installation of
Solr 5.0.0 (in non-cloud mode). I installed using the installation script
(a nice addition!) with default options, so I have Solr in /opt/solr and
its data in /var/solr.

Here's what I'm trying:

curl 'http://localhost:8983/solr/admin/cores?action=CREATEname=new_core
'

But I get this error:

Error CREATEing SolrCore 'new_core': Unable to create core [new_core]
Caused by: Can't find resource 'solrconfig.xml' in classpath or
'/var/solr/data/new_core/conf'

Solr isn't even creating /var/solr/data/new_core, which I guess is the root
of the problem. But /var/solr is owned by the solr user and I can do `sudo
-u solr mkdir /var/solr/data/new_core` just fine. So why isn't Solr making
this directory?

I see that 'instanceDir' is required, but I don't get an error message if I
*don't* use it, so I'm not sure how required it actually is. I'm also not
sure if its supposed to be a full path or a relative path or what, so here
are a couple of other guesses at the correct incantation:

These both return the same error message as my first try, so no dice...

FWIW, I get the same error message even if I try doing this with the Solr
Admin GUI so I'm really puzzled. Is the GUI supposed to work?

I found a thread on Stack Overflow about this same problem (
http://stackoverflow.com/a/28945428/122763) that suggests using configSet.
Okay, the installer put some configs sets in
/opt/solr/server /opt/solr/server/solr/configsets, and the 'basic_config'
config set has a solrconfig.xml in it, so maybe that would solve my
solrconfig.xml error?

If I compare the HTTP API to the `solr create -c foo` script, it appears
that the script creates the instance directory and copies in conf files
*before
*it calls the HTTP API... surely the HTTP API doesn't require the caller to
create a directory and copy files first, does it?

--
Mark E. Haase

Re: Uneven index distribution using composite router

right, when you take over routing, making sure the distribution is
even is now your responsibility.

Your assumption is that the amount of _text_ in each doc is roughly
the same between your three languages, have you verified this? And are
you doing anything like copyFields that are kicking in on one shard
but not the others (e.g. if you have text_en fields you might be
copying that to text_en_all but not doing so with text_ger to
text_ger_all). that's totally a shot in the dark though.

Best,
Erick

On Thu, Mar 26, 2015 at 10:26 AM, Shamik Bandopadhyay sham...@gmail.com wrote:
 Hi,

I'm using a three level composite router in a solr cloud environment,
 primarily for multi-tenant and field collapsing. The format is as follows.

 *language!topic!url*.

 An example would be :

 ENU!12345!www.testurl.com/enu/doc1
 GER!12345!www.testurl.com/ger/doc2
 CHS!67890!www.testurl.com/chs/doc3

 The Solr Cloud cluster contains 2 shard, each having 3 replicas. After
 indexing around 10 million documents, I'm observing that the index size in
 shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is
 getting indexed in shard 1. Since 60% of the document is english, I expect
 the index size to be higher on one shard, but the difference seem little
 too high.

 The idea is to make sure that all ENU!12345 documents are routed to one
 shard so that distributed field collapsing works. Is there something I can
 do differently here to make a better distribution ?

 Any pointers will be appreciated.

 Regards,
 Shamik

Re: How to create a core by API?

Got to the comments section and add any corrections you'd like,
that'll get bubbled up.

Best,
Erick

On Thu, Mar 26, 2015 at 10:45 AM, Mark E. Haase meha...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 1:31 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 Hmmm, looks like I stand corrected. I haven't kept complete track
 there, looks like this one didn't stick in my head.


 I'm not saying you're wrong. The configSet parameter doesn't work at all in
 my set up, so you might be right... I'm just wondering where that's
 documented.

 I thought Solr documentation was rough back in the 1.6 days, but wow...
 it's gotten shockingly bad in Solr 5.


 As far as the docs are concerned, all patches welcome!


 What kind of patch do you mean? Isn't all the documentation maintained on
 confluence?

 --
 Mark E. Haase
 202-815-0201

Re: How to create a core by API?

On 3/26/2015 10:24 AM, Mark E. Haase wrote:
 I can't get the Core Admin API to work. I have a brand new installation of
 Solr 5.0.0 (in non-cloud mode). I installed using the installation script
 (a nice addition!) with default options, so I have Solr in /opt/solr and
 its data in /var/solr.

 Here's what I'm trying:

 curl 'http://localhost:8983/solr/admin/cores?action=CREATEname=new_core
 '

 But I get this error:

 Error CREATEing SolrCore 'new_core': Unable to create core [new_core]
 Caused by: Can't find resource 'solrconfig.xml' in classpath or

The error message tells you what is wrong.

The CoreAdmin API requires that the instanceDir already exist, with a
conf directory inside it that contains solrconfig.xml, schema.xml, and
any other necessary config files.

If you want completely from-scratch creation without any existing
filesystem layout, you will need to run SolrCloud, which keeps config
files in the zookeeper database.  At that point you would be using the
Collections API.

If you go to Core Admin in the admin UI and click the Add Core button,
you will see the following note:

|instanceDir| and |dataDir| need to exist before you can create the core

This message is not quite accurate -- the dataDir (defaulting to
${instanceDir}/data) will be created if it does not already exist, and
the user running Solr has the required permissions to create it.  The
message also doesn't say anything about the conf directory or the two
required XML files.

Thanks,
Shawn

Performance json vs javabin

2015-03-26 Thread Tech MOnkey

Has anyone done performance tests between json and javabin?  Scale tipped 
towards javabin when compared to 
XML(https://issues.apache.org/jira/browse/SOLR-486).  I am curious to know if 
it is same with json when load is 600 per minute, for example.
Thanks,

Re: Replacing a group of documents (Delete/Insert) without a query on the index ever showing an empty list (Docs)

On 3/26/2015 9:53 AM, Russell Taylor wrote:
 I have an index which is made up of groups of documents, each group is 
 defined by a field called keyField (keyField:A).
 I need to delete all the keyField:A documents and replace them with a brand 
 new set without the index ever returning
 zero documents on a query.

 At the moment I deleteByQuery:keyField:A and then insert a SolrInputDocument 
 list via
 SolrJ into my index. I have a small time period where somebody doing a 
 q=fieldKey:A
 can be returned an empty list.

 FYI: The keyField group might be just 100 documents or up to 10 million.

As long as you don't have any commits with openSearcher=true happening
between the delete and the insert, that would work ... but why go
through the manual delete if you don't have to?

If you define a suitable uniqueKey field in your schema, simply indexing
a new document with the same value in the uniqueKeyfield as an existing
document will delete the old document.

https://wiki.apache.org/solr/UniqueKey

Thanks,
Shawn

delta import on changes in entity within a document

2015-03-26 Thread PeterKerk

I have the following data-config:

document name=locations
entity pk=id name=location query=select * from locations WHERE
isapproved='true'
deltaImportQuery=select * from locations WHERE updatedate lt; 
getdate()
AND isapproved='true' AND id='${dataimporter.delta.id}'
deltaQuery=select id from locations where isapproved='true' AND
updatedate gt; '${dataimporter.last_index_time}'



entity name=offerdetails query=SELECT title as
offer_title,ISNULL(img,'') as offer_thumb,id as offer_id
,startdate as offer_startdate
,enddate as offer_enddate
,description as offer_description
,updatedate as offer_updatedate
FROM offers WHERE objectid=${location.id}
/entity   
/document


Now, when the object in the [locations] table is updated, my delta import
(/dataimport?command=delta-import) query works perfectly.
But when an offer is updated in the [offers] table, this is not seen by the
deltaimport command. Is there way to delta-import only the updated offers
for the respective location if an offer is updated? And then without:
a. having to fully import ALL locations 
or 
b. having to update this single location and then do a regular deltaimport?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/delta-import-on-changes-in-entity-within-a-document-tp4195615.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Uneven index distribution using composite router

2015-03-26 Thread shamik

Thanks for your reply Eric.

In my case, I've 14 languages, out of which 50% of the documents belong to
English. German and CHS will probably constitute another 25%. I'm not using
copyfield, rather, each language has it's dedicated field such as title_enu,
text_enu, title_ger,text_ger, etc. Since I know the language prior to index
time, this works for, me.

I've added one more sample key in the example.

ENU!12345!www.testurl.com/enu/doc1
ENU!12345!www.testurl.com/enu/doc10
GER!12345!www.testurl.com/ger/doc2
CHS!67890!www.testurl.com/chs/doc3

As you can see, there are 2 documents in english having same topic id
(12345). I added topicid as part of the key to make sure that they are
residing in the same shard in order to make field collapsing work on topic
id. I can perhaps remove the composite key and only have language and url,
something like,

ENU!www.testurl.com/enu/doc1

But that'll probably not solve the distribution issue. You mentioned when
you take over routing, making sure the distribution is even is now your
responsibility. I'm wondering, what's the best practice to make it happen ?
I can get away from composite router and manually assign a bunch of language
to a dedicated shard, both during index and query time. But I'm not sure
keeping a map is an efficient way of dealing with it.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Uneven-index-distribution-using-composite-router-tp4195569p4195591.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to create a core by API?

2015-03-26 Thread Yonik Seeley

On Thu, Mar 26, 2015 at 1:45 PM, Mark E. Haase meha...@gmail.com wrote:
 I'm not saying you're wrong. The configSet parameter doesn't work at all in
 my set up, so you might be right... I'm just wondering where that's
 documented.

Trying on current trunk, I got it to work:

/opt/code/lusolr_trunk/solr$ curl -XPOST
http://localhost:8983/solr/admin/cores?action=CREATEname=demo3instanceDir=demo3configSet=basic_configs;

?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeaderint name=status0/intint
name=QTime769/int/lststr name=coredemo3/str
/response

Although I'm not thrilled with a different parameter name  for cloud
vs non-cloud.  I come from the camp that believes that overloading is
both natural and easily understood (e.g.  I don't find foo + bar
and 1.5 + 2.5 both using + confusing).

-Yonik

Re: Build index from Oracle, adding fields

2015-03-26 Thread Julian Perry


On 27/03/2015 12:42, Shawn Heisey wrote:

If that's not practical, then the only real option you have is to drop
back to one entity, and build a single SELECT statement (using JOIN and
some form of CONCAT) that will gather all the information from all the
tables at the same time, and combine multiple values together into one
SQL result field with some kind of delimiter.  Then you can use the
RegexTransformer's splitBy functionality to turn the concatenated data
back into multiple values for your multi-valued field.  Database servers
tend to be REALLY good at JOIN operations, so the database would be
doing the heavy lifting.


I did try that in fact (and do it with one of my other indexes).

However, with this index the sub-select can return 200 rows of
200 characters - and that blows up in Oracle as the field is
over 4000 characters long (and the work-around for that is to
use clob's - but that has its own performance problems).

Currently I am doing this by exporting a CSV file and
processing it with a C program - and then reading the CSV with
SOLR :(

--
Cheers
Jules.

Re: i'm a newb: questions about schema.xml

This is key: managed-schema

You've managed to get things started with the managed schema.
Therefore, you need to use the REST API to
add/subtract/multiply/divide. This is different than schemaless,
although it _is_ related. And they're both different than having a
schema.xml to edit.

Or start over _without_ a managed schema, not quite sure how you
started that way in the first place ;). You may have used bin/solr
start -e schemaless when you started and maybe forgot?

Here's a place to start:
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig

Best,
Erick

On Thu, Mar 26, 2015 at 4:41 PM, Mark Bramer mbra...@esri.com wrote:
Hi Shawn,

Definitely helpful to know about the instance and files stuff in Admin. I'm
not running cloud, so I looked in the /conf directory but there's no
schema.xml:

Here's what's in my core's Files:
currency.xml
elevate.xml
lang
params.json
protwords.txt
solrconfig.xml
stopwords.txt
synonyms.txt

and echoed by ls -l:

-rw-r--r-- 1 root root 3974 Feb 15 11:38 currency.xml
-rw-r--r-- 1 root root 1348 Feb 15 11:38 elevate.xml
drwxr-xr-x 2 root root 4096 Mar 23 10:46 lang
-rw-r--r-- 1 root root 29733 Mar 23 18:04 managed-schema
-rw-r--r-- 1 root root 308 Feb 15 11:38 params.json
-rw-r--r-- 1 root root 873 Feb 15 11:38 protwords.txt
-rw-r--r-- 1 root root 60591 Feb 15 11:38 solrconfig.xml
-rw-r--r-- 1 root root 781 Feb 15 11:38 stopwords.txt
-rw-r--r-- 1 root root 1119 Feb 15 11:38 synonyms.txt

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Thursday, March 26, 2015 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: i'm a newb: questions about schema.xml

On 3/26/2015 4:57 PM, Mark Bramer wrote:
I'm a Solr newb. I've been poking around for several days on my own test
instance, and also online at the info available. But one thing just isn't
jiving and I can't put my finger on why. I've searched many many times but
I don't see what I'm looking for, so I'm thinking perhaps I have a
fundamental semantic misunderstanding of something somewhere. Everywhere I
read, everyone talks about schema.xml and how important is. I fully get
what it's for but I don't get where it is, how it's used (by me), how I edit
it, and how I create new indexes once I've edited it.

I've installed, and am successfully running, solr 5.0.0 on Linux. I've
followed the widely recommended-by-all quick start at:
http://lucene.apache.org/solr/quickstart.html. I get through it fine, I
post a bunch of stuff, I use the web UI to query for, and see, data I would
expect to see. Should I now have a schema.xml file somewhere that is
somehow connected to my new index? If so, where is it? Was it present from
install or did it get created when I made my first core (bin/solr create -c
ati_docs)?

Is it the one in /configsets/basic_configs/conf? Is that the default one?

If I want to 'modify' schema.xml to do some different indexing/analyzing,
how do I start? Make a copy of that schema.xml, move it somewhere else and
modify it? If so, how do I create a new index using this schema.xml?

Or am I running in schemaless mode? I don't think I am because it
appears that I would have to specifically state this as a command line
parameter, i.e. bin/solr start -e schemaless

What fundamentals am I missing? I'm coming to Solr from Elasticsearch, and
I've already recognized some differences. Is my ES background clouding my
grasp of Solr fundamentals?

Hopefully you know what core you are using, so you can go to the admin UI and
find it in the Core Selector dropdown list. Assuming you can do that, you
will find yourself looking at the Overview tab for that core.

https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface

Once you are looking at the core overview, in the upper right corner of your
browser window is a section called Instance ... which has an entry that is
ALSO called Instance. Inside the directory indicated by that field, you
should have a conf directory. The config and schema for that index are found
in that conf directory.

If you're running SolrCloud, then you can forget everything I just said ...
the active configs will be found within the zookeeper database, and you can
use the Cloud-Tree tab in the admin UI to find your collections and see
which configName is linked to each one. You'll want to become familiar with
the zkcli script

RE: i'm a newb: questions about schema.xml

2015-03-26 Thread Mark Bramer

Pretty sure I found what I am looking for: 
https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig

I noticed the managed-schema file and a couple Google searches with that 
finally landed me at that link.  

Interesting that the file is hidden from the Files list in the Admin UI.

Thanks!


-Original Message-
From: Mark Bramer 
Sent: Thursday, March 26, 2015 7:42 PM
To: 'solr-user@lucene.apache.org'
Subject: RE: i'm a newb: questions about schema.xml

Hi Shawn,

Definitely helpful to know about the instance and files stuff in Admin.  I'm 
not running cloud, so I looked in the /conf directory but there's no schema.xml:

Here's what's in my core's Files: 
  currency.xml
  elevate.xml
  lang
  params.json
  protwords.txt
  solrconfig.xml
  stopwords.txt
  synonyms.txt

and echoed by ls -l: 

-rw-r--r-- 1 root root  3974 Feb 15 11:38 currency.xml
-rw-r--r-- 1 root root  1348 Feb 15 11:38 elevate.xml drwxr-xr-x 2 root root  
4096 Mar 23 10:46 lang
-rw-r--r-- 1 root root 29733 Mar 23 18:04 managed-schema
-rw-r--r-- 1 root root   308 Feb 15 11:38 params.json
-rw-r--r-- 1 root root   873 Feb 15 11:38 protwords.txt
-rw-r--r-- 1 root root 60591 Feb 15 11:38 solrconfig.xml
-rw-r--r-- 1 root root   781 Feb 15 11:38 stopwords.txt
-rw-r--r-- 1 root root  1119 Feb 15 11:38 synonyms.txt

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Thursday, March 26, 2015 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: i'm a newb: questions about schema.xml

On 3/26/2015 4:57 PM, Mark Bramer wrote:
 I'm a Solr newb.  I've been poking around for several days on my own test 
 instance, and also online at the info available.  But one thing just isn't 
 jiving and I can't put my finger on why.  I've searched many many times but I 
 don't see what I'm looking for, so I'm thinking perhaps I have a fundamental 
 semantic misunderstanding of something somewhere.  Everywhere I read, 
 everyone talks about schema.xml and how important is.  I fully get what it's 
 for but I don't get where it is, how it's used (by me), how I edit it, and 
 how I create new indexes once I've edited it.

 I've installed, and am successfully running, solr 5.0.0 on Linux.  I've 
 followed the widely recommended-by-all quick start at: 
 http://lucene.apache.org/solr/quickstart.html.  I get through it fine, I post 
 a bunch of stuff, I use the web UI to query for, and see, data I would expect 
 to see.  Should I now have a schema.xml file somewhere that is somehow 
 connected to my new index?  If so, where is it?  Was it present from install 
 or did it get created when I made my first core (bin/solr create -c ati_docs)?

 [root@machine solr-5.0.0]# find -name schema.xml 
 ./example/example-DIH/solr/tika/conf/schema.xml
 ./example/example-DIH/solr/rss/conf/schema.xml
 ./example/example-DIH/solr/solr/conf/schema.xml
 ./example/example-DIH/solr/db/conf/schema.xml
 ./example/example-DIH/solr/mail/conf/schema.xml
 ./server/solr/configsets/basic_configs/conf/schema.xml
 ./server/solr/configsets/sample_techproducts_configs/conf/schema.xml
 [root@machine solr-5.0.0]#

 Is it the one in /configsets/basic_configs/conf?  Is that the default one?

 If I want to 'modify' schema.xml to do some different indexing/analyzing, how 
 do I start?  Make a copy of that schema.xml, move it somewhere else and 
 modify it?  If so, how do I create a new index using this schema.xml?

 Or am I running in schemaless mode?  I don't think I am because it 
 appears that I would have to specifically state this as a command line 
 parameter, i.e. bin/solr start -e schemaless

 What fundamentals am I missing?  I'm coming to Solr from Elasticsearch, and 
 I've already recognized some differences.  Is my ES background clouding my 
 grasp of Solr fundamentals?

Hopefully you know what core you are using, so you can go to the admin UI and 
find it in the Core Selector dropdown list.  Assuming you can do that, you 
will find yourself looking at the Overview tab for that core.

https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface

Once you are looking at the core overview, in the upper right corner of your 
browser window is a section called Instance ... which has an entry that is 
ALSO called Instance.  Inside the directory indicated by that field, you 
should have a conf directory.  The config and schema for that index are found 
in that conf directory.

If you're running SolrCloud, then you can forget everything I just said ... the 
active configs will be found within the zookeeper database, and you can use the 
Cloud-Tree tab in the admin UI to find your collections and see which 
configName is linked to each one.  You'll want to become familiar with the 
zkcli script in server/scripts/cloud-scripts.

https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Whether it is SolrCloud or not, you can always LOOK at your configs right in 
the admin UI -- click on the

Re: Custom TokenFilter

2015-03-26 Thread Test Test

Hi Erick, 
For me, this classCastException is caused by the wrong use of TokenFilter.In 
fieldType declaration (schema.xml), i've put :tokenizer 
class=com.tamingtext.texttamer.solr.SentenceTokenizerFactory/And instead 
using TokenizerFactory in my class, i utilize TokenFilterFactory like this 
:public class SentenceTokenizerFactory  extends TokenFilterFactory 
So when solr try to load my class, it expects to load TokenizerFactory class 
but it has TokenFilterFactory class. 
Regards,Andry


 Le Jeudi 26 mars 2015 4h13, Erick Erickson erickerick...@gmail.com a 
écrit :
   

 Thanks for letting us know the resolution, the problem was bugging me

Erick

On Wed, Mar 25, 2015 at 4:21 PM, Test Test andymish...@yahoo.fr wrote:
 Re,
 Finally, i think i found where this problem comes.I didn't use the right 
 class extender, instead using Tokenizers, i'm using Token filter.
 Eric, thanks for your replies.Regards.


      Le Mercredi 25 mars 2015 23h55, Test Test andymish...@yahoo.fr a écrit 
:


  Re,
 I have tried to remove all the redundant jar files.Then i've relaunched it 
 but it's blocked directly on the same issue.
 It's very strange.
 Regards,


    Le Mercredi 25 mars 2015 23h31, Erick Erickson erickerick...@gmail.com a 
écrit :


  Wait, you didn't put, say, lucene-core-4.10.2.jar into your
 contrib/tamingtext/dependency directory did you? That means you have
 Lucene (and solr and solrj and ...) in your class path twice since
 they're _already_ in your classpath by default since you're running
 Solr.

 All your jars should be in your aggregate classpath exactly once.
 Having them in twice would explain the cast exception. not need these
 in the tamingtext/dependency subdirectory, just the things that are
 _not_ in Solr already..

 Best,
 Erick

 On Wed, Mar 25, 2015 at 12:21 PM, Test Test andymish...@yahoo.fr wrote:
 Re,
 Sorry about the image.So, there are all my dependencies jar in listing below 
 :
    - commons-cli-2.0-mahout.jar

    - commons-compress-1.9.jar

    - commons-io-2.4.jar

    - commons-logging-1.2.jar

    - httpclient-4.4.jar

    - httpcore-4.4.jar

    - httpmime-4.4.jar

    - junit-4.10.jar

    - log4j-1.2.17.jar

    - lucene-analyzers-common-4.10.2.jar

    - lucene-benchmark-4.10.2.jar

    - lucene-core-4.10.2.jar

    - mahout-core-0.9.jar

    - noggit-0.5.jar

    - opennlp-maxent-3.0.3.jar

    - opennlp-tools-1.5.3.jar

    - slf4j-api-1.7.9.jar

    - slf4j-simple-1.7.10.jar

    - solr-solrj-4.10.2.jar


 I have put them into a specific repository 
 (contrib/tamingtext/dependency).And my jar containing my class into another 
 repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml

    - lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /

    - lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar /


 Thanks for advance
 Regards.



      Le Mercredi 25 mars 2015 20h18, Test Test andymish...@yahoo.fr a 
écrit :


  Re,
 Sorry about the image.So, there are all my dependencies jar in listing below 
 :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- 
 commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- 
 httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- 
 lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- 
 lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- 
 opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- 
 slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar
 I have put them into a specific repository 
 (contrib/tamingtext/dependency).And my jar containing my class into another 
 repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml
 lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib 
 dir=../../../contrib/tamingtext/dependency regex=.*\.jar /
 Thanks for advance,Regards.



    Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com 
a écrit :


  Images don't come through the mailing list, can't see your image.

 Whether or not all the jars in the directory you're working on are
 consistent is the least of your problems. Are the libs to be found in any
 _other_ place specified on your classpath?

 Best,
 Erick

 On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote:

 Thanks Eric,

 I'm working on Solr 4.10.2 and all my dependencies jar seems to be
 compatible with this version.

 [image: Image en ligne]

 I can't figure out which one make this issue.

 Thanks
 Regards,




  Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a
 écrit :


 bq: 13 moreCaused by: java.lang.ClassCastException: class
 com.tamingtext.texttamer.solr.

 This usually means you have jar files from different versions of Solr
 in your classpath.

 Best,
 Erick

 On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote:
  Hi there,
  I'm trying to create my own TokenizerFactory (from tamingtext's
 book).After setting schema.xml and have adding path in solrconfig.xml, i
 start solr.I have

SolrCloud -- Blocking access to administration commands while keeping the solr internal communication

2015-03-26 Thread Oded Sofer

Hello there, 

There are many blogs discussing this issue but it is hard to find if someone 
had managed to resolve that. 
We have many nodes in the SolrCloud, implementing the iptable restriction will 
fill the iptable with many rules that will affect performance. 
We are using 4.3.10, on Tomcat 5.

Re: i'm a newb: questions about schema.xml

On 3/26/2015 4:57 PM, Mark Bramer wrote:
I'm a Solr newb. I've been poking around for several days on my own test
instance, and also online at the info available. But one thing just isn't
jiving and I can't put my finger on why. I've searched many many times but I
don't see what I'm looking for, so I'm thinking perhaps I have a fundamental
semantic misunderstanding of something somewhere. Everywhere I read,
everyone talks about schema.xml and how important is. I fully get what it's
for but I don't get where it is, how it's used (by me), how I edit it, and
how I create new indexes once I've edited it.

I've installed, and am successfully running, solr 5.0.0 on Linux. I've
followed the widely recommended-by-all quick start at:
http://lucene.apache.org/solr/quickstart.html. I get through it fine, I post
a bunch of stuff, I use the web UI to query for, and see, data I would expect
to see. Should I now have a schema.xml file somewhere that is somehow
connected to my new index? If so, where is it? Was it present from install
or did it get created when I made my first core (bin/solr create -c ati_docs)?

Is it the one in /configsets/basic_configs/conf? Is that the default one?

If I want to 'modify' schema.xml to do some different indexing/analyzing, how
do I start? Make a copy of that schema.xml, move it somewhere else and
modify it? If so, how do I create a new index using this schema.xml?

Or am I running in schemaless mode? I don't think I am because it appears
that I would have to specifically state this as a command line parameter,
i.e. bin/solr start -e schemaless

What fundamentals am I missing? I'm coming to Solr from Elasticsearch, and
I've already recognized some differences. Is my ES background clouding my
grasp of Solr fundamentals?

Hopefully you know what core you are using, so you can go to the admin
UI and find it in the Core Selector dropdown list. Assuming you can
do that, you will find yourself looking at the Overview tab for that core.

https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface

Once you are looking at the core overview, in the upper right corner of
your browser window is a section called Instance ... which has an
entry that is ALSO called Instance. Inside the directory indicated by
that field, you should have a conf directory. The config and schema for
that index are found in that conf directory.

If you're running SolrCloud, then you can forget everything I just said
... the active configs will be found within the zookeeper database, and
you can use the Cloud-Tree tab in the admin UI to find your collections
and see which configName is linked to each one. You'll want to become
familiar with the zkcli script in server/scripts/cloud-scripts.

https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Whether it is SolrCloud or not, you can always LOOK at your configs
right in the admin UI -- click on the Files tab after you select the
core from the selector.

Thanks,
Shawn

Re: Build index from Oracle, adding fields

On 3/26/2015 5:19 PM, Julian Perry wrote:
 I have an index with, say, 10 fields.

 I load that index directly from Oracle - data-config.xml using
 JDBC.  I can load 10 million rows very quickly.  This direct
 way of loading from Oracle straight into SOLR is fantastic -
 really efficient and saves writing loads of import/export code
 (e.g. via a CSV file).

 Of those 10 fields - two of them (set to multiValued) come from
 a separate table and there are anything from 1 to 10 rows per
 row from the main table.

 I can use a nested entity to extract the child rows for each of
 the 10m rows in the main table - but then SOLR generates 10m
 separate SQL calls - and the load time goes from a few minutes
 to several days.

 On smaller tables - just a few thousand rows - I can use a
 second nested entity with a JDBC call - but not for very large
 tables.

 Could I load the data in two steps:
 1)  load the main 10m rows
 2)  load into the existing index by adding the data from a
 second SQL call into fields for each existing row (i.e.
 an UPDATE instead of an INSERT).

 I don't know what syntax/option might achieve that.  There
 is incremental loading - but I think that replaces whole rows
 rather then updating individual fields.  Or maybe it does
 do both?

If those child tables do not have a large number of entries, you can
configure caching on the inner entities so that the information doesn't
need to actually be requested from the database server.  If there are a
large number of entries, then that may not be possible due to memory
constraints.

https://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

If that's not practical, then the only real option you have is to drop
back to one entity, and build a single SELECT statement (using JOIN and
some form of CONCAT) that will gather all the information from all the
tables at the same time, and combine multiple values together into one
SQL result field with some kind of delimiter.  Then you can use the
RegexTransformer's splitBy functionality to turn the concatenated data
back into multiple values for your multi-valued field.  Database servers
tend to be REALLY good at JOIN operations, so the database would be
doing the heavy lifting.

https://wiki.apache.org/solr/DataImportHandler#RegexTransformer

Solr does have an equivalent concept to SQL's UPDATE, but there are
enough caveats to using it that it may not be a good option:

https://wiki.apache.org/solr/Atomic_Updates

Thanks,
Shawn

RE: i'm a newb: questions about schema.xml

2015-03-26 Thread Mark Bramer

Hi Shawn,

Definitely helpful to know about the instance and files stuff in Admin.  I'm 
not running cloud, so I looked in the /conf directory but there's no schema.xml:

Here's what's in my core's Files: 
  currency.xml
  elevate.xml
  lang
  params.json
  protwords.txt
  solrconfig.xml
  stopwords.txt
  synonyms.txt

and echoed by ls -l: 

-rw-r--r-- 1 root root  3974 Feb 15 11:38 currency.xml
-rw-r--r-- 1 root root  1348 Feb 15 11:38 elevate.xml
drwxr-xr-x 2 root root  4096 Mar 23 10:46 lang
-rw-r--r-- 1 root root 29733 Mar 23 18:04 managed-schema
-rw-r--r-- 1 root root   308 Feb 15 11:38 params.json
-rw-r--r-- 1 root root   873 Feb 15 11:38 protwords.txt
-rw-r--r-- 1 root root 60591 Feb 15 11:38 solrconfig.xml
-rw-r--r-- 1 root root   781 Feb 15 11:38 stopwords.txt
-rw-r--r-- 1 root root  1119 Feb 15 11:38 synonyms.txt

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Thursday, March 26, 2015 7:28 PM
To: solr-user@lucene.apache.org
Subject: Re: i'm a newb: questions about schema.xml

On 3/26/2015 4:57 PM, Mark Bramer wrote:
 I'm a Solr newb.  I've been poking around for several days on my own test 
 instance, and also online at the info available.  But one thing just isn't 
 jiving and I can't put my finger on why.  I've searched many many times but I 
 don't see what I'm looking for, so I'm thinking perhaps I have a fundamental 
 semantic misunderstanding of something somewhere.  Everywhere I read, 
 everyone talks about schema.xml and how important is.  I fully get what it's 
 for but I don't get where it is, how it's used (by me), how I edit it, and 
 how I create new indexes once I've edited it.

 I've installed, and am successfully running, solr 5.0.0 on Linux.  I've 
 followed the widely recommended-by-all quick start at: 
 http://lucene.apache.org/solr/quickstart.html.  I get through it fine, I post 
 a bunch of stuff, I use the web UI to query for, and see, data I would expect 
 to see.  Should I now have a schema.xml file somewhere that is somehow 
 connected to my new index?  If so, where is it?  Was it present from install 
 or did it get created when I made my first core (bin/solr create -c ati_docs)?

 [root@machine solr-5.0.0]# find -name schema.xml 
 ./example/example-DIH/solr/tika/conf/schema.xml
 ./example/example-DIH/solr/rss/conf/schema.xml
 ./example/example-DIH/solr/solr/conf/schema.xml
 ./example/example-DIH/solr/db/conf/schema.xml
 ./example/example-DIH/solr/mail/conf/schema.xml
 ./server/solr/configsets/basic_configs/conf/schema.xml
 ./server/solr/configsets/sample_techproducts_configs/conf/schema.xml
 [root@machine solr-5.0.0]#

 Is it the one in /configsets/basic_configs/conf?  Is that the default one?

 If I want to 'modify' schema.xml to do some different indexing/analyzing, how 
 do I start?  Make a copy of that schema.xml, move it somewhere else and 
 modify it?  If so, how do I create a new index using this schema.xml?

 Or am I running in schemaless mode?  I don't think I am because it 
 appears that I would have to specifically state this as a command line 
 parameter, i.e. bin/solr start -e schemaless

 What fundamentals am I missing?  I'm coming to Solr from Elasticsearch, and 
 I've already recognized some differences.  Is my ES background clouding my 
 grasp of Solr fundamentals?

Hopefully you know what core you are using, so you can go to the admin UI and 
find it in the Core Selector dropdown list.  Assuming you can do that, you 
will find yourself looking at the Overview tab for that core.

https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface

Once you are looking at the core overview, in the upper right corner of your 
browser window is a section called Instance ... which has an entry that is 
ALSO called Instance.  Inside the directory indicated by that field, you 
should have a conf directory.  The config and schema for that index are found 
in that conf directory.

If you're running SolrCloud, then you can forget everything I just said ... the 
active configs will be found within the zookeeper database, and you can use the 
Cloud-Tree tab in the admin UI to find your collections and see which 
configName is linked to each one.  You'll want to become familiar with the 
zkcli script in server/scripts/cloud-scripts.

https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Whether it is SolrCloud or not, you can always LOOK at your configs right in 
the admin UI -- click on the Files tab after you select the core from the 
selector.

Thanks,
Shawn

i'm a newb: questions about schema.xml

2015-03-26 Thread Mark Bramer

Hello,

I'm a Solr newb.  I've been poking around for several days on my own test 
instance, and also online at the info available.  But one thing just isn't 
jiving and I can't put my finger on why.  I've searched many many times but I 
don't see what I'm looking for, so I'm thinking perhaps I have a fundamental 
semantic misunderstanding of something somewhere.  Everywhere I read, everyone 
talks about schema.xml and how important is.  I fully get what it's for but I 
don't get where it is, how it's used (by me), how I edit it, and how I create 
new indexes once I've edited it.

I've installed, and am successfully running, solr 5.0.0 on Linux.  I've 
followed the widely recommended-by-all quick start at: 
http://lucene.apache.org/solr/quickstart.html.  I get through it fine, I post a 
bunch of stuff, I use the web UI to query for, and see, data I would expect to 
see.  Should I now have a schema.xml file somewhere that is somehow connected 
to my new index?  If so, where is it?  Was it present from install or did it 
get created when I made my first core (bin/solr create -c ati_docs)?

[root@machine solr-5.0.0]# find -name schema.xml
./example/example-DIH/solr/tika/conf/schema.xml
./example/example-DIH/solr/rss/conf/schema.xml
./example/example-DIH/solr/solr/conf/schema.xml
./example/example-DIH/solr/db/conf/schema.xml
./example/example-DIH/solr/mail/conf/schema.xml
./server/solr/configsets/basic_configs/conf/schema.xml
./server/solr/configsets/sample_techproducts_configs/conf/schema.xml
[root@machine solr-5.0.0]#

Is it the one in /configsets/basic_configs/conf?  Is that the default one?

If I want to 'modify' schema.xml to do some different indexing/analyzing, how 
do I start?  Make a copy of that schema.xml, move it somewhere else and modify 
it?  If so, how do I create a new index using this schema.xml?

Or am I running in schemaless mode?  I don't think I am because it appears 
that I would have to specifically state this as a command line parameter, i.e. 
bin/solr start -e schemaless

What fundamentals am I missing?  I'm coming to Solr from Elasticsearch, and 
I've already recognized some differences.  Is my ES background clouding my 
grasp of Solr fundamentals?

Thanks for any help.

Mark Bramer | Technical Team Lead, DC Services
Esri | 8615 Westwood Center Dr | Vienna, VA 22182 | USA
T 703 506 9515 x8017 | mbra...@esri.commailto:mbra...@esri.com | esri.com

Re: SolrCloud -- Blocking access to administration commands while keeping the solr internal communication

On 3/26/2015 3:38 PM, Oded Sofer wrote:
 There are many blogs discussing this issue but it is hard to find if someone 
 had managed to resolve that. 
 We have many nodes in the SolrCloud, implementing the iptable restriction 
 will fill the iptable with many rules that will affect performance. 
 We are using 4.3.10, on Tomcat 5. 

Because Solr is a webapp, it relies on software outside itself to
provide network and protocol (HTTP) communication.  In your case, that
software is Tomcat.  For others, it is Jetty, JBoss, Weblogic, or one of
several other possibilities.  This means that there are many things that
are impossible (or extremely difficult) for Solr to handle within its
own code.  Security is one of them.

This is one of the major reasons that Solr will become a true
application at some point in the future.  When Solr can control the
network and the HTTP server, we will be able to restrict access to the
admin UI separately from access to the query interface, the update
interface, replication, etc.

As far as your iptables rule list ... are your Solr servers contained
within discrete IP address blocks that could be added to the rule list
as subnets instead of individual addresses?  Ideally you will handle
complicated access controls on edge firewalls or as ACLs on internal
routing devices, not at the host level.

Thanks,
Shawn

Build index from Oracle, adding fields

2015-03-26 Thread Julian Perry


Hi

I have looked and cannot see any clear answers to this on
the Interwebs.


I have an index with, say, 10 fields.

I load that index directly from Oracle - data-config.xml using
JDBC.  I can load 10 million rows very quickly.  This direct
way of loading from Oracle straight into SOLR is fantastic -
really efficient and saves writing loads of import/export code
(e.g. via a CSV file).

Of those 10 fields - two of them (set to multiValued) come from
a separate table and there are anything from 1 to 10 rows per
row from the main table.

I can use a nested entity to extract the child rows for each of
the 10m rows in the main table - but then SOLR generates 10m
separate SQL calls - and the load time goes from a few minutes
to several days.

On smaller tables - just a few thousand rows - I can use a
second nested entity with a JDBC call - but not for very large
tables.

Could I load the data in two steps:
1)  load the main 10m rows
2)  load into the existing index by adding the data from a
second SQL call into fields for each existing row (i.e.
an UPDATE instead of an INSERT).

I don't know what syntax/option might achieve that.  There
is incremental loading - but I think that replaces whole rows
rather then updating individual fields.  Or maybe it does
do both?

Any other techniques that would be fast/efficient?

Help!

--
Cheers
Jules.

Re: Solr Monitoring - Stored Stats?

2015-03-26 Thread Otis Gospodnetic

Matt,

SPM will give you all that out of the box with alerts, anomaly detection etc. 
See http://sematext.com/spm

Otis

 

 On Mar 25, 2015, at 11:26, Matt Kuiper matt.kui...@issinc.com wrote:
 
 Hello,
 
 I am familiar with the JMX points that Solr exposes to allow for monitoring 
 of statistics like QPS, numdocs, Average Query Time...
 
 I am wondering if there is a way to configure Solr to automatically store the 
 value of these stats over time (for a given time interval), and then allow a 
 user to query a stat over a time range.  So for the QPS stat,  the query 
 might return a set that includes the QPS value for each hour in the time 
 range specified.
 
 Thanks,
 Matt

Re: Data indexing is going too slow on single shard Why?

2015-03-26 Thread Nitin Solanki

Great thanks Shawn...
As you said -  **For 204GB of data per server, I recommend at least 128GB
of total RAM,
preferably 256GB**. Therefore, if I have 204GB of data on single
server/shard then I prefer is 256GB by which searching will be fast and
never slow down. Is it?

On Wed, Mar 25, 2015 at 9:50 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 3/25/2015 8:42 AM, Nitin Solanki wrote:
  Server configuration:
  8 CPUs.
  32 GB RAM
  O.S. - Linux

 snip

  are running.  Java heap set to 4096 MB in Solr.  While indexing,

 snip

  *Currently*, I have 1 shard  with 2 replicas using SOLR CLOUD.
  Data Size:
  102Gsolr/node1/solr/wikingram_shard1_replica2
  102Gsolr/node2/solr/wikingram_shard1_replica1

 If both of those are on the same machine, I'm guessing that you're
 running two Solr instances on that machine, so there's 8GB of RAM used
 for Java.  That means you have about 24 GB of RAM left for caching ...
 and 200GB of index data to cache.

 24GB is not enough to cache 200GB of index.  If there is only one Solr
 instance (leaving 28GB for caching) with 102GB of data on the machine,
 it still might not be enough.  See that SolrPerformanceProblems wiki
 page I linked in my earlier email.

 For 102GB of data per server, I recommend at least 64GB of total RAM,
 preferably 128GB.

 For 204GB of data per server, I recommend at least 128GB of total RAM,
 preferably 256GB.

 Thanks,
 Shawn

Re: Data indexing is going too slow on single shard Why?